From andrew at beekhof.net  Tue Sep  1 06:33:57 2009
From: andrew at beekhof.net (Andrew Beekhof)
Date: Tue, 1 Sep 2009 08:33:57 +0200
Subject: [Linux-cluster] Problem with Pacemaker and Corosync
In-Reply-To: <6e4c20e70908310612o120933cema2609513f13be78c@mail.gmail.com>
References: <6e4c20e70908310612o120933cema2609513f13be78c@mail.gmail.com>
Message-ID: <b80f82d20908312333q1db2956dnae162d8970c73cec@mail.gmail.com>

try turning on debug, there's nothing in the logs that indicate why
the lrmd is having a problem

On Mon, Aug 31, 2009 at 3:12 PM, Thomas Georgiou<tageorgiou at gmail.com> wrote:
> Hi,
>
> I have installed Pacemaker 1.0.5, Corosync 1.0.0, and Openais 1.0.1
> from source according to the Clusterlabs docs. ?However, when I go to
> start corosync/pacemaker, I get error messages pertaining to lrm and
> cibadmin -Q hangs and complains that the remote node is not available.
> ?Attached is the corosync log.
>
> Any ideas?
>
> Thomas Georgiou
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From ccaulfie at redhat.com  Tue Sep  1 07:00:53 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 01 Sep 2009 08:00:53 +0100
Subject: [Linux-cluster] Use alternate network interfaces for heartbeat
	in	RHCS
In-Reply-To: <29ae894c0908281038i20f408f7gf14483ad6f73ca5e@mail.gmail.com>
References: <29ae894c0908280710w3f999b0as1438451bf5869a8e@mail.gmail.com>	<4A97E67A.1030506@redhat.com>	<29ae894c0908280724p4fc8cbe4g6943a3138f278c1b@mail.gmail.com>	<4A97EF74.6090904@redhat.com>
	<29ae894c0908281038i20f408f7gf14483ad6f73ca5e@mail.gmail.com>
Message-ID: <4A9CC6A5.3090502@redhat.com>

On 28/08/09 18:38, brem belguebli wrote:
> Hi
> the clusternodes defined in cluster.conf are :
>
> node1.mydomain
> node2.mydomain
>
> which correpond to the bond0 interfaces on both nodes.
>
> I expect to use node1-hb and node2-hb as heartbeat interfaces. (bond1)
>
> I may have misunderstood something, but are you telling me that I have
> to use the nodeX-hb as clusternodes in cluster.conf ?

Yes, that's exactly what you need to do.

Chrissie

> Brem
>
> 2009/8/28 Christine Caulfield <ccaulfie at redhat.com
> <mailto:ccaulfie at redhat.com>>
>
>     On 28/08/09 15:24, brem belguebli wrote:
>
>         Hi Chrissie,
>         Are you pointing me to the paragraph "what's the right way to
>         ....eth0 ?"
>         I've tried this at first adding adding a suffix to the
>         interfaces but
>         nothing happened. Suffix -p may be hardcoded (I've used -hb)
>         Here's an outputof my /etc/hosts (identical on both nodes):
>         10.146.15.184   node1 node1.mydomain
>         10.146.15.175   node2 node2.mydomain
>         192.168.84.50   node1-hb
>         192.168.84.51   node2-hb
>         Still using bond0 ....
>         Brem
>
>
>
>     The suffix isn't hard-coded or anything to do with cman really, it's
>     just a way of distinguishing interfaces.
>
>     You need to edit cluster.conf to tell it to use the different name.
>
>     If you get despertate then you can always put the IP address in
>     cluster.conf, but the output from cman_tool nodes doesn't look as nice!
>
>
>     Chrissie
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ccaulfie at redhat.com  Tue Sep  1 07:01:38 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 01 Sep 2009 08:01:38 +0100
Subject: [Linux-cluster] Use alternate network interfaces for heartbeat
	in	RHCS
In-Reply-To: <29ae894c0908310056u68c1272dud8babe5ac3542f9d@mail.gmail.com>
References: <29ae894c0908280710w3f999b0as1438451bf5869a8e@mail.gmail.com>	<4A97E67A.1030506@redhat.com>	<29ae894c0908280724p4fc8cbe4g6943a3138f278c1b@mail.gmail.com>	<4A97EF74.6090904@redhat.com>
	<4A97F090.9080508@redhat.com>	<29ae894c0908281045l6c93c7dbo2d0e4f27c5bab14e@mail.gmail.com>
	<29ae894c0908310056u68c1272dud8babe5ac3542f9d@mail.gmail.com>
Message-ID: <4A9CC6D2.1000408@redhat.com>

On 31/08/09 08:56, brem belguebli wrote:
> Hi,
> I was wondering if there is a way with cman to configure 2 heartbeat
> channels (let's say my prod bond0 and my outband bond1) as it seems to
> be possible with openais and their redundant ring interfaces configuration.
> Brem
>

Yes, there's an item about it on the FAQ page I mentioned earlier. I 
hope it's more complete the the last one!

Chrissie

> 2009/8/28, brem belguebli <brem.belguebli at gmail.com
> <mailto:brem.belguebli at gmail.com>>:
>
>     Ok,
>
>     It answers my last question.
>
>     I have been confused by some mention somewhere in a post or doc
>     saying that cman has a built-in algorithm to determine, just by
>     adding entries in /etc/hosts, the right interfaces to use .
>
>     Brem
>
>
>
>     2009/8/28 Christine Caulfield <ccaulfie at redhat.com
>     <mailto:ccaulfie at redhat.com>>
>
>         On 28/08/09 15:53, Christine Caulfield wrote:
>
>             On 28/08/09 15:24, brem belguebli wrote:
>
>                 Hi Chrissie,
>                 Are you pointing me to the paragraph "what's the right
>                 way to ....eth0 ?"
>                 I've tried this at first adding adding a suffix to the
>                 interfaces but
>                 nothing happened. Suffix -p may be hardcoded (I've used -hb)
>                 Here's an outputof my /etc/hosts (identical on both nodes):
>                 10.146.15.184 node1 node1.mydomain
>                 10.146.15.175 node2 node2.mydomain
>                 192.168.84.50 node1-hb
>                 192.168.84.51 node2-hb
>                 Still using bond0 ....
>                 Brem
>
>
>
>             The suffix isn't hard-coded or anything to do with cman
>             really, it's
>             just a way of distinguishing interfaces.
>
>             You need to edit cluster.conf to tell it to use the
>             different name.
>
>             If you get despertate then you can always put the IP address in
>             cluster.conf, but the output from cman_tool nodes doesn't
>             look as nice!
>
>
>         I haven't read that article in detail before but you're right,
>         it makes no mention of changing cluster.conf!
>
>         I've fixed that now, thank you.
>
>
>         Chrissie
>
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>         https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From brem.belguebli at gmail.com  Tue Sep  1 07:53:17 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 1 Sep 2009 09:53:17 +0200
Subject: [Linux-cluster] Use alternate network interfaces for heartbeat in
	RHCS
In-Reply-To: <4A9CC6D2.1000408@redhat.com>
References: <29ae894c0908280710w3f999b0as1438451bf5869a8e@mail.gmail.com>
	<4A97E67A.1030506@redhat.com>
	<29ae894c0908280724p4fc8cbe4g6943a3138f278c1b@mail.gmail.com>
	<4A97EF74.6090904@redhat.com> <4A97F090.9080508@redhat.com>
	<29ae894c0908281045l6c93c7dbo2d0e4f27c5bab14e@mail.gmail.com>
	<29ae894c0908310056u68c1272dud8babe5ac3542f9d@mail.gmail.com>
	<4A9CC6D2.1000408@redhat.com>
Message-ID: <29ae894c0909010053g15ef0c30y513d5501b776bbf2@mail.gmail.com>

Hello Chrissie,

I couldn't find the item in the doc (the CMAN FAQ).

Brem




2009/9/1, Christine Caulfield <ccaulfie at redhat.com>:
>
> On 31/08/09 08:56, brem belguebli wrote:
>
>> Hi,
>> I was wondering if there is a way with cman to configure 2 heartbeat
>> channels (let's say my prod bond0 and my outband bond1) as it seems to
>> be possible with openais and their redundant ring interfaces
>> configuration.
>> Brem
>>
>>
> Yes, there's an item about it on the FAQ page I mentioned earlier. I hope
> it's more complete the the last one!
>
> Chrissie
>
> 2009/8/28, brem belguebli <brem.belguebli at gmail.com
>> <mailto:brem.belguebli at gmail.com>>:
>>
>>    Ok,
>>
>>    It answers my last question.
>>
>>    I have been confused by some mention somewhere in a post or doc
>>    saying that cman has a built-in algorithm to determine, just by
>>    adding entries in /etc/hosts, the right interfaces to use .
>>
>>    Brem
>>
>>
>>
>>    2009/8/28 Christine Caulfield <ccaulfie at redhat.com
>>    <mailto:ccaulfie at redhat.com>>
>>
>>        On 28/08/09 15:53, Christine Caulfield wrote:
>>
>>            On 28/08/09 15:24, brem belguebli wrote:
>>
>>                Hi Chrissie,
>>                Are you pointing me to the paragraph "what's the right
>>                way to ....eth0 ?"
>>                I've tried this at first adding adding a suffix to the
>>                interfaces but
>>                nothing happened. Suffix -p may be hardcoded (I've used
>> -hb)
>>                Here's an outputof my /etc/hosts (identical on both nodes):
>>                10.146.15.184 node1 node1.mydomain
>>                10.146.15.175 node2 node2.mydomain
>>                192.168.84.50 node1-hb
>>                192.168.84.51 node2-hb
>>                Still using bond0 ....
>>                Brem
>>
>>
>>
>>            The suffix isn't hard-coded or anything to do with cman
>>            really, it's
>>            just a way of distinguishing interfaces.
>>
>>            You need to edit cluster.conf to tell it to use the
>>            different name.
>>
>>            If you get despertate then you can always put the IP address in
>>            cluster.conf, but the output from cman_tool nodes doesn't
>>            look as nice!
>>
>>
>>        I haven't read that article in detail before but you're right,
>>        it makes no mention of changing cluster.conf!
>>
>>        I've fixed that now, thank you.
>>
>>
>>        Chrissie
>>
>>        --
>>        Linux-cluster mailing list
>>        Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>        https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/29354cbc/attachment.htm>

From Alain.Moulle at bull.net  Tue Sep  1 09:08:45 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Tue, 01 Sep 2009 11:08:45 +0200
Subject: [Linux-cluster] cluster.conf in another place ?
Message-ID: <4A9CE49D.8020504@bull.net>

Hi,
I have this cman version :
cman-3.0.0-15.rc1.fc11.x86_64
is it possible to put the cluster.conf in another place than /etc/cluster/.
and if so, how can I tell it to cman ?
Thanks
Alain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/af405d77/attachment.htm>

From fdinitto at redhat.com  Tue Sep  1 09:15:01 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 01 Sep 2009 11:15:01 +0200
Subject: [Linux-cluster] cluster.conf in another place ?
In-Reply-To: <4A9CE49D.8020504@bull.net>
References: <4A9CE49D.8020504@bull.net>
Message-ID: <1251796501.339.42.camel@cerberus.int.fabbione.net>

On Tue, 2009-09-01 at 11:08 +0200, Alain.Moulle wrote:
> Hi,
> I have this cman version :
> cman-3.0.0-15.rc1.fc11.x86_64
> is it possible to put the cluster.conf in another place
> than /etc/cluster/.
> and if so, how can I tell it to cman ?
> Thanks
> Alain

The exact same way I already explained to you before:

http://www.redhat.com/archives/linux-cluster/2009-August/msg00260.html

Cheers
Fabio



From jakov.sosic at srce.hr  Tue Sep  1 09:19:54 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Tue, 1 Sep 2009 11:19:54 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <4A9C3EFF.1000006@nerd.com>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3EFF.1000006@nerd.com>
Message-ID: <20090901111954.57a7a8ac@nb-jsosic>

On Mon, 31 Aug 2009 14:22:07 -0700
Rick Stevens <ricks at nerd.com> wrote:

> I don't see that there's anything to fix.  You had a three-node
> cluster so you needed a majority of nodes up to maintain a quorum.
> One node died, killing quorum and thus stopping the cluster

Nope. Quorum is still there. I have 3 nodes with qdisk, and two nodes
remained in quorum. Then, I had to reboot the nodes because of some
multipath/scsi changes, and after that, they only try to fence the
missing node, and they can't get to it's fencing device, and rgmanager
is not showing in my output. Quorum is regained after both nodes
restarted.

So, bassically what I mean is that you cannot start cluster with one
node and it's fence device missing, although you have gained quorum.
2 nodes and qdisk is much more than I need - I need only one node +
qdisk for cluster to function properly.


> As a three-node cluster, it's dead.
> It can't be run as a three-node cluster until the third node is
> fixed.  Those are the rules.

Well this is the part that I don't like :) Why can't I for example put
10 missing nodes in my cluster.conf - if other nodes don't gain quorum,
they shouldn't start services and that's it, but if they do gain
quorum, what's the point of constantly trying to fence missing fence
device of missing node?!

 
> A two node cluster requires special handling of things to prevent the
> dread split-brain situation, which is what two_node does.  Running the
> surviving nodes as a two-node cluster is, by definition, a
> reconfiguration.  I'd say simply requiring you to set two_node is
> pretty damned innocuous to let you run a dead (ok, mortally wounded)
> cluster.
> 
> If you pulled a drive out of a RAID6--thus degrading it to a RAID5--
> would you complain because it didn't remain a RAID6?

First of all, RAID6 without one disk _IS NOT_ RAID5. In terms of
redundancy they are the same, but on disk data is not the same, so that
two are not equal.

And yes - I would complain if I had to _REBUILD_ degraded array to
RAID5. And until it's rebuilded, if the array was unavailable - that
would be a major issue - what's the point of redundancy then if I loose
whole array/cluster when one unit fails? But with RAID6 I don't have to.
As a matter of fact, I can loose one more drive, and leave it in that
state until I buy new two drives and hotplug them into the chassis.
EG.: until quorum is maintained, array and data in it are not
jeopardized. With RHCS that should be the same, shouldn't it?

I'm just asking, why can't I leave the missing node in the
configuration, which will be active once it returns from dealer? Why do
I have to reconfig the cluster? That is not a good behaviour IMHO -
there should be some command to mark node as missing, and the cluster
should work fine with two nodes + qdisk because it has quorum. Isn't
that the point of quorum?

What's the point of cluster, if one node cannot malfunction, and be
taken away to repairs, without the need of setting up a new cluster?

In your RAID6 configuration, it's like taking away one disk breaks the
array until you rebuild it...


-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |



From jakov.sosic at srce.hr  Tue Sep  1 09:21:57 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Tue, 1 Sep 2009 11:21:57 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <4A9C421A.8020003@nerd.com>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3EFF.1000006@nerd.com>
	<4A9C421A.8020003@nerd.com>
Message-ID: <20090901112157.1698207c@nb-jsosic>

On Mon, 31 Aug 2009 14:35:22 -0700
Rick Stevens <ricks at nerd.com> wrote:

> On re-reading my response, it seemed unintentionally harsh.  I didn't
> mean any disrespect, sir.  I was simply questioning the concept that a
> reconfiguration of a cluster shouldn't be required when, indeed the
> cluster was being reconfigured.  The other response I saw to this
> thread regarding planning, and things such as last-man-standing was
> much better worded.
> 
> My apologies if it seemed I was jumping down your throat.  I wasn't.

Come on, no problem at all. We are just discussing, we are not shooting
each other with rifles :)
If I deserve harsh words, be free to land them on me :)

PS.: There is quorum, there is qdisk, so last-man-standing issue is
solved, planned whatever. Maybe I wasn't makin' myself clear...


-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |



From jakov.sosic at srce.hr  Tue Sep  1 09:26:48 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Tue, 1 Sep 2009 11:26:48 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <4A9C3FEE.3000309@wol.de>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3FEE.3000309@wol.de>
Message-ID: <20090901112648.3e32c88d@nb-jsosic>

On Mon, 31 Aug 2009 23:26:06 +0200
"Marc - A. Dahlhaus" <mad at wol.de> wrote:

> I think your so called 'limitation' is more related to mistakes that
> was made during the planing phase of your cluster setup than to
> missing functionality.

Yeah, and what can be that mistake? I'll feel free to quote John:

> The best course of action to take would be to remove that missing
> node from your cluster configuration using conga,
> system-config-cluster, or by hand
> editing /etc/cluster/cluster.conf.  As long as it exists in the
> configuration then the other nodes will expect it to join the
> cluster, and they will attempt to fence it when they try to join the
> cluster and see it is not present.

Where's the issue with my config there? It seems to be an issue with
RHCS misbehaving with one fence device missing.


> Please take a look at the qdisk manpage and aditionaly to the cman
> faq sections about tiebraker, qdisks and especially the last man
> standing setup...

qdisk already set up. I never said I lost quorum. I have quorum. But
without one node missing completely, with it's fence device, rgmanager
just doesn't start up the services, and is not listed in clustat. I
repeat, I HAVE GAINED QUORUM, and I have qdisk for the case two out of
three are out.



-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |



From kkovachev at varna.net  Tue Sep  1 09:44:06 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Tue, 1 Sep 2009 12:44:06 +0300
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <20090901112648.3e32c88d@nb-jsosic>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3FEE.3000309@wol.de>
	<20090901112648.3e32c88d@nb-jsosic>
Message-ID: <20090901093910.M73501@varna.net>

Hi,

On Tue, 1 Sep 2009 11:26:48 +0200, Jakov Sosic wrote
> On Mon, 31 Aug 2009 23:26:06 +0200
> "Marc - A. Dahlhaus" <mad at wol.de> wrote:
> 
> > I think your so called 'limitation' is more related to mistakes that
> > was made during the planing phase of your cluster setup than to
> > missing functionality.
> 
> Yeah, and what can be that mistake? I'll feel free to quote John:
> 
> > The best course of action to take would be to remove that missing
> > node from your cluster configuration using conga,
> > system-config-cluster, or by hand
> > editing /etc/cluster/cluster.conf.  As long as it exists in the
> > configuration then the other nodes will expect it to join the
> > cluster, and they will attempt to fence it when they try to join the
> > cluster and see it is not present.
> 
> Where's the issue with my config there? It seems to be an issue with
> RHCS misbehaving with one fence device missing.

it is not just 'one fence device missing' it is the only fence device that
could fence that node, so if you add fence manual as a last resort, you will
be able to bring back your cluster to live in such cases

> 
> > Please take a look at the qdisk manpage and aditionaly to the cman
> > faq sections about tiebraker, qdisks and especially the last man
> > standing setup...
> 
> qdisk already set up. I never said I lost quorum. I have quorum. But
> without one node missing completely, with it's fence device, rgmanager
> just doesn't start up the services, and is not listed in clustat. I
> repeat, I HAVE GAINED QUORUM, and I have qdisk for the case two out of
> three are out.
> 
> -- 
> |    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
> =================================================================
> | start fighting cancer -> http://www.worldcommunitygrid.org/   |
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From mad at wol.de  Tue Sep  1 10:29:36 2009
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Tue, 01 Sep 2009 12:29:36 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <20090901112648.3e32c88d@nb-jsosic>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3FEE.3000309@wol.de>
	<20090901112648.3e32c88d@nb-jsosic>
Message-ID: <1251800976.10463.43.camel@marc>

Am Dienstag, den 01.09.2009, 11:26 +0200 schrieb Jakov Sosic:
> On Mon, 31 Aug 2009 23:26:06 +0200
> "Marc - A. Dahlhaus" <mad at wol.de> wrote:
> 
> > I think your so called 'limitation' is more related to mistakes that
> > was made during the planing phase of your cluster setup than to
> > missing functionality.
> 
> Yeah, and what can be that mistake? I'll feel free to quote John:
> 
> > The best course of action to take would be to remove that missing
> > node from your cluster configuration using conga,
> > system-config-cluster, or by hand
> > editing /etc/cluster/cluster.conf.  As long as it exists in the
> > configuration then the other nodes will expect it to join the
> > cluster, and they will attempt to fence it when they try to join the
> > cluster and see it is not present.
> 
> Where's the issue with my config there? It seems to be an issue with
> RHCS misbehaving with one fence device missing.

It isn't misbehaving at all here.

The job of RHCS in this case is to save your data against failure.

If fenced can't fence a node successfully, RHCS will wait in stalled
mode (because it doesn't get a successful response from the fence-agent)
until someone who knows what he is doing comes around to fix up the
problem. If it wouldn't do it that way a separated node could eat up
your data. It is the job of fenced to stop all activities until fencing
is in a working shape again.

This behaviour is perfectly fine IMO...

The mistakes in the planing phase of your cluster setup are:

- You use system dependent fencing like "HP iLO" wich will be missing
  if your system is missing and no independent fencing like an
  APC PowerSwitch...

  Think about a power purge which kills booth of your PSU on a system,
  a system dependent management device would be missing from your
  network in this case leading to exactly the problem you're faced with.

- You haven't read through the related documentation (read on and you
  spot to what i am referring to).

> > Please take a look at the qdisk manpage and aditionaly to the cman
> > faq sections about tiebraker, qdisks and especially the last man
> > standing setup...
> 
> qdisk already set up. I never said I lost quorum. I have quorum. But
> without one node missing completely, with it's fence device, rgmanager
> just doesn't start up the services, and is not listed in clustat. I
> repeat, I HAVE GAINED QUORUM, and I have qdisk for the case two out of
> three are out.

Your mistake is that you started fenced in normal mode in which it will
fence all nodes that it can't reach to get around a possible split-brain
scenario. You need to start fenced in "clean start" without fencing mode
(read the fenced manpage as it is documented there) because you know
everything is right. RHCS can't on it's own know anything about, for it
the missing node is separated on network/link layer and could be eating
up all your data just fine until it gets fenced. As long as the missing
node isn't joining it will not get fenced by the other nodes in clean
start node of fenced so it will be your way out of this problem.


Marc



From esggrupos at gmail.com  Tue Sep  1 10:38:54 2009
From: esggrupos at gmail.com (ESGLinux)
Date: Tue, 1 Sep 2009 12:38:54 +0200
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
Message-ID: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>

Hi All,
First, sorry if this can be considered Off Topic but my first aproach was
using clustering to my problem so I suposse you could have the same problem.

I have 2 computers running JBoss and I need to share a directory for the
cache (I use OSCache).

First I try to use a NFS service on a Red hat Cluster ( I use this
reference
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html
)

My problem is that the performance with this approach is too much low for my
application. So I decided to make each machine use its own cache dir and
with rsync keep this dirs synchronized.

I don?t know if what I have done is a stupidity or It?s a good solution, so
what do you think about it?, Do you know any way to do what I need

Thanks in advance.

ESG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/1ec76b67/attachment.htm>

From jakov.sosic at srce.hr  Tue Sep  1 10:48:51 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Tue, 1 Sep 2009 12:48:51 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <1251800976.10463.43.camel@marc>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3FEE.3000309@wol.de>
	<20090901112648.3e32c88d@nb-jsosic>
	<1251800976.10463.43.camel@marc>
Message-ID: <20090901124851.7daf2c75@pc-jsosic.srce.hr>

On Tue, 01 Sep 2009 12:29:36 +0200
"Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
wrote:

> It isn't misbehaving at all here.
> 
> The job of RHCS in this case is to save your data against failure.
> 
> If fenced can't fence a node successfully, RHCS will wait in stalled
> mode (because it doesn't get a successful response from the
> fence-agent) until someone who knows what he is doing comes around to
> fix up the problem. If it wouldn't do it that way a separated node
> could eat up your data. It is the job of fenced to stop all
> activities until fencing is in a working shape again.
> 
> This behaviour is perfectly fine IMO...

Isn't that the mission of quorum? For example - if you have qourum you
will run services, if you don't have quorum you won't. If there is a
qdisk and single of three nodes is missing, it can't have quorum - so
it can't run services?

OK I understand that this is the safer way... But that's why I was
asking in the first place for a command to flag node as missing
completely, so that I can avoid all reconfigurations. Reconfiguration
while a node missing will trigger odd behavior when node comes back -
node will be fenced constantly because it has wrong config version.


> - You use system dependent fencing like "HP iLO" wich will be missing
>   if your system is missing and no independent fencing like an
>   APC PowerSwitch...

Yes but that are the only devices I have available for fencing. So that
is the limitation of hardware, on which I don't have any influence in
this case. I already know that fence devices are my only SPOF
currently... But I can't help myself.


>   Think about a power purge which kills booth of your PSU on a system,
>   a system dependent management device would be missing from your
>   network in this case leading to exactly the problem you're faced
> with.

I will take a look if APC UPS-es have something like killpower for
certain ports, if not I will set up false manual fencing to get around
this problem. Thank you.


> Your mistake is that you started fenced in normal mode in which it
> will fence all nodes that it can't reach to get around a possible
> split-brain scenario. You need to start fenced in "clean start"
> without fencing mode (read the fenced manpage as it is documented
> there) because you know everything is right.

Adding clean_start again presumes reconfiguring just like removing a
node and declaring cluster a two_node, and I wanted to avoid
reconfigurations...


Thank you very much.


-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |



From robejrm at gmail.com  Tue Sep  1 10:59:36 2009
From: robejrm at gmail.com (Juan Ramon Martin Blanco)
Date: Tue, 1 Sep 2009 12:59:36 +0200
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
In-Reply-To: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
References: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
Message-ID: <8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>

On Tue, Sep 1, 2009 at 12:38 PM, ESGLinux <esggrupos at gmail.com> wrote:

> Hi All,
> First, sorry if this can be considered Off Topic but my first aproach was
> using clustering to my problem so I suposse you could have the same problem.
>
> I have 2 computers running JBoss and I need to share a directory for the
> cache (I use OSCache).
>
> First I try to use a NFS service on a Red hat Cluster ( I use this
> reference
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html
> )
>

Do you have a shared storage? If the answer is yes, just use gfs and mount
the filesystem on both machines.

Greetings,
Juanra

>
> My problem is that the performance with this approach is too much low for
> my application. So I decided to make each machine use its own cache dir and
> with rsync keep this dirs synchronized.
>
> I don?t know if what I have done is a stupidity or It?s a good solution, so
> what do you think about it?, Do you know any way to do what I need
>
> Thanks in advance.
>
> ESG
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/557e27c6/attachment.htm>

From esggrupos at gmail.com  Tue Sep  1 11:05:19 2009
From: esggrupos at gmail.com (ESGLinux)
Date: Tue, 1 Sep 2009 13:05:19 +0200
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
In-Reply-To: <8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>
References: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
	<8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>
Message-ID: <3128ba140909010405q387b858bk2d73d723f1ecca79@mail.gmail.com>

2009/9/1 Juan Ramon Martin Blanco <robejrm at gmail.com>

>
>
> On Tue, Sep 1, 2009 at 12:38 PM, ESGLinux <esggrupos at gmail.com> wrote:
>
>> Hi All,
>> First, sorry if this can be considered Off Topic but my first aproach was
>> using clustering to my problem so I suposse you could have the same problem.
>>
>> I have 2 computers running JBoss and I need to share a directory for the
>> cache (I use OSCache).
>>
>> First I try to use a NFS service on a Red hat Cluster ( I use this
>> reference
>> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html
>> )
>>
>
> Do you have a shared storage? If the answer is yes, just use gfs and mount
> the filesystem on both machines.
>
>
Nop, I haven?t but your answer makes me a new question. Can I use GFS
directly without making a cluster?
I mean can I attach the iSCSI devices for example, and mount a GFS
filesystem on it without creating a cluster, and a service asociated to this
GFS filesystem?


Thanks

ESG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/92a76f07/attachment.htm>

From mad at wol.de  Tue Sep  1 11:11:21 2009
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Tue, 01 Sep 2009 13:11:21 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <20090901124851.7daf2c75@pc-jsosic.srce.hr>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3FEE.3000309@wol.de>
	<20090901112648.3e32c88d@nb-jsosic> <1251800976.10463.43.camel@marc>
	<20090901124851.7daf2c75@pc-jsosic.srce.hr>
Message-ID: <1251803481.12201.10.camel@marc>

Am Dienstag, den 01.09.2009, 12:48 +0200 schrieb Jakov Sosic:
> On Tue, 01 Sep 2009 12:29:36 +0200
> "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
> wrote:
> 
> > It isn't misbehaving at all here.
> > 
> > The job of RHCS in this case is to save your data against failure.
> > 
> > If fenced can't fence a node successfully, RHCS will wait in stalled
> > mode (because it doesn't get a successful response from the
> > fence-agent) until someone who knows what he is doing comes around to
> > fix up the problem. If it wouldn't do it that way a separated node
> > could eat up your data. It is the job of fenced to stop all
> > activities until fencing is in a working shape again.
> > 
> > This behaviour is perfectly fine IMO...
> 
> Isn't that the mission of quorum? For example - if you have qourum you
> will run services, if you don't have quorum you won't. If there is a
> qdisk and single of three nodes is missing, it can't have quorum - so
> it can't run services?
> 
> OK I understand that this is the safer way... But that's why I was
> asking in the first place for a command to flag node as missing
> completely, so that I can avoid all reconfigurations. Reconfiguration
> while a node missing will trigger odd behavior when node comes back -
> node will be fenced constantly because it has wrong config version.
> 
> 
> > - You use system dependent fencing like "HP iLO" wich will be missing
> >   if your system is missing and no independent fencing like an
> >   APC PowerSwitch...
> 
> Yes but that are the only devices I have available for fencing. So that
> is the limitation of hardware, on which I don't have any influence in
> this case. I already know that fence devices are my only SPOF
> currently... But I can't help myself.
> 
> 
> >   Think about a power purge which kills booth of your PSU on a system,
> >   a system dependent management device would be missing from your
> >   network in this case leading to exactly the problem you're faced
> > with.
> 
> I will take a look if APC UPS-es have something like killpower for
> certain ports, if not I will set up false manual fencing to get around
> this problem. Thank you.

Its actually the "APC Switched Rack PDUs" that you should look after.
You can get an 8 port device for a small budget...

> > Your mistake is that you started fenced in normal mode in which it
> > will fence all nodes that it can't reach to get around a possible
> > split-brain scenario. You need to start fenced in "clean start"
> > without fencing mode (read the fenced manpage as it is documented
> > there) because you know everything is right.
> 
> Adding clean_start again presumes reconfiguring just like removing a
> node and declaring cluster a two_node, and I wanted to avoid
> reconfigurations...

It's just a matter of starting fenced with "fenced -c" on your two
nodes. No cluster.conf fiddling needed at all...

Search for "start_daemon fenced" in /etc/init.d/cman and add " -c"
behind it. You should remove that after your third node gets back.

> Thank you very much.

You're welcome.

Marc



From robejrm at gmail.com  Tue Sep  1 11:20:33 2009
From: robejrm at gmail.com (Juan Ramon Martin Blanco)
Date: Tue, 1 Sep 2009 13:20:33 +0200
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
In-Reply-To: <3128ba140909010405q387b858bk2d73d723f1ecca79@mail.gmail.com>
References: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
	<8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>
	<3128ba140909010405q387b858bk2d73d723f1ecca79@mail.gmail.com>
Message-ID: <8a5668960909010420u1ae1d0d8g46023d3e86c246e2@mail.gmail.com>

On Tue, Sep 1, 2009 at 1:05 PM, ESGLinux <esggrupos at gmail.com> wrote:

>
>
> 2009/9/1 Juan Ramon Martin Blanco <robejrm at gmail.com>
>
>>
>>
>> On Tue, Sep 1, 2009 at 12:38 PM, ESGLinux <esggrupos at gmail.com> wrote:
>>
>>> Hi All,
>>> First, sorry if this can be considered Off Topic but my first aproach was
>>> using clustering to my problem so I suposse you could have the same problem.
>>>
>>> I have 2 computers running JBoss and I need to share a directory for the
>>> cache (I use OSCache).
>>>
>>> First I try to use a NFS service on a Red hat Cluster ( I use this
>>> reference
>>> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html
>>> )
>>>
>>
>> Do you have a shared storage? If the answer is yes, just use gfs and mount
>> the filesystem on both machines.
>>
>>
> Nop, I haven?t but your answer makes me a new question. Can I use GFS
> directly without making a cluster?
> I mean can I attach the iSCSI devices for example, and mount a GFS
> filesystem on it without creating a cluster, and a service asociated to this
> GFS filesystem?
>
You should use one iscsi lun shared by both cluster nodes. You can mount a
GFS filesystem without locking (lock=nolock) with (correct me if I am wrong)
the node not being part of a cluster, but only in one node at a time.
You can mount a GFS filesystem created for a certain cluster  without having
the filesystem configured as a resource, the only requisite is that the
nodes mounting the filesystem have to be part of that certain cluster.

Regards,
Juanra

>
>
> Thanks
>
> ESG
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/6a34f348/attachment.htm>

From esggrupos at gmail.com  Tue Sep  1 12:21:47 2009
From: esggrupos at gmail.com (ESGLinux)
Date: Tue, 1 Sep 2009 14:21:47 +0200
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
In-Reply-To: <8a5668960909010420u1ae1d0d8g46023d3e86c246e2@mail.gmail.com>
References: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
	<8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>
	<3128ba140909010405q387b858bk2d73d723f1ecca79@mail.gmail.com>
	<8a5668960909010420u1ae1d0d8g46023d3e86c246e2@mail.gmail.com>
Message-ID: <3128ba140909010521x3dc2771dn3aa627c62612ffa@mail.gmail.com>

> You should use one iscsi lun shared by both cluster nodes. You can mount a
> GFS filesystem without locking (lock=nolock) with (correct me if I am wrong)
> the node not being part of a cluster, but only in one node at a time.
> You can mount a GFS filesystem created for a certain cluster  without
> having the filesystem configured as a resource, the only requisite is that
> the nodes mounting the filesystem have to be part of that certain cluster.
>

If I have understand you ok, I need to create a cluster, for example,
MYCLUSTER, then create a resource of type GFS filesystem. After that I must
create 2 nodes in the cluster, access de iscsi lun from this nodes and
finally mount the gfs filesystem.

With these I can share this directory between the nodes without the risk of
file corruption?

Well, in the case I can?t use this approach, is there any way to do this?

Thanks for your time,

ESG






>
> Regards,
> Juanra
>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/c71dd728/attachment.htm>

From kkovachev at varna.net  Tue Sep  1 12:36:35 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Tue, 1 Sep 2009 15:36:35 +0300
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
In-Reply-To: <3128ba140909010521x3dc2771dn3aa627c62612ffa@mail.gmail.com>
References: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
	<8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>
	<3128ba140909010405q387b858bk2d73d723f1ecca79@mail.gmail.com>
	<8a5668960909010420u1ae1d0d8g46023d3e86c246e2@mail.gmail.com>
	<3128ba140909010521x3dc2771dn3aa627c62612ffa@mail.gmail.com>
Message-ID: <20090901123347.M11629@varna.net>

On Tue, 1 Sep 2009 14:21:47 +0200, ESGLinux wrote 
>  
> 
>  
> 
>  
> You should use one iscsi lun shared by both cluster nodes. You can mount a
GFS filesystem without locking (lock=nolock) with (correct me if I am wrong)
the node not being part of a cluster, but only in one node at a time. 
> You can mount a GFS filesystem created for a certain cluster without having
the filesystem configured as a resource, the only requisite is that the nodes
mounting the filesystem have to be part of that certain cluster. 
>  
> 
> If I have understand you ok, I need to create a cluster, for example,
MYCLUSTER, then create a resource of type GFS filesystem. After that I must
create 2 nodes in the cluster, access de iscsi lun from this nodes and finally
mount the gfs filesystem. 
> 
> With these I can share this directory between the nodes without the risk of
file corruption? 
> 
> Well, in the case I can?t use this approach, is there any way to do this? 
> 

if you don't have shared storage, but you have local disks - you may use DRBD
instead of iSCSI. About the cluster - you don't need to define any resources -
just have a cluster which is quorate to avoid data corruption while accessing
the GFS on DRBD

> Thanks for your time, 
> 
> ESG 
> 
>  
> 
> Regards, 
> Juanra 
>  
> 
>



From esggrupos at gmail.com  Tue Sep  1 12:48:13 2009
From: esggrupos at gmail.com (ESGLinux)
Date: Tue, 1 Sep 2009 14:48:13 +0200
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
In-Reply-To: <20090901123347.M11629@varna.net>
References: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
	<8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>
	<3128ba140909010405q387b858bk2d73d723f1ecca79@mail.gmail.com>
	<8a5668960909010420u1ae1d0d8g46023d3e86c246e2@mail.gmail.com>
	<3128ba140909010521x3dc2771dn3aa627c62612ffa@mail.gmail.com>
	<20090901123347.M11629@varna.net>
Message-ID: <3128ba140909010548k5d112bd4t4be25ba2deccffee@mail.gmail.com>

2009/9/1 Kaloyan Kovachev <kkovachev at varna.net>

> On Tue, 1 Sep 2009 14:21:47 +0200, ESGLinux wrote
> >
> >
> >
> >
> >
> > You should use one iscsi lun shared by both cluster nodes. You can mount
> a
> GFS filesystem without locking (lock=nolock) with (correct me if I am
> wrong)
> the node not being part of a cluster, but only in one node at a time.
> > You can mount a GFS filesystem created for a certain cluster without
> having
> the filesystem configured as a resource, the only requisite is that the
> nodes
> mounting the filesystem have to be part of that certain cluster.
> >
> >
> > If I have understand you ok, I need to create a cluster, for example,
> MYCLUSTER, then create a resource of type GFS filesystem. After that I must
> create 2 nodes in the cluster, access de iscsi lun from this nodes and
> finally
> mount the gfs filesystem.
> >
> > With these I can share this directory between the nodes without the risk
> of
> file corruption?
> >
> > Well, in the case I can?t use this approach, is there any way to do this?
> >
>
> if you don't have shared storage, but you have local disks - you may use
> DRBD
> instead of iSCSI.


this looks interesting, any good manual about using DRBD?



> About the cluster - you don't need to define any resources -
> just have a cluster which is quorate to avoid data corruption while
> accessing
> the GFS on DRBD
>
>
ok, so I only need the cluster with the 2 nodes and the gfs filesystem
formated, for example like this:

gfs_mkfs -p lock_dlm -t MyCLUSTER:mydata -j 8 /dev/sda1

When  I have done this I can mount /dev/sda1 in both nodes as use it

isn?t it?

Thanks,

ESG


> Thanks for your time,
> >
> > ESG
> >
> >
> >
> > Regards,
> > Juanra
> >
> >
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/59a4c0a9/attachment.htm>

From ccaulfie at redhat.com  Tue Sep  1 12:57:32 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 01 Sep 2009 13:57:32 +0100
Subject: [Linux-cluster] Use alternate network interfaces for heartbeat
	in	RHCS
In-Reply-To: <29ae894c0909010053g15ef0c30y513d5501b776bbf2@mail.gmail.com>
References: <29ae894c0908280710w3f999b0as1438451bf5869a8e@mail.gmail.com>	<4A97E67A.1030506@redhat.com>	<29ae894c0908280724p4fc8cbe4g6943a3138f278c1b@mail.gmail.com>	<4A97EF74.6090904@redhat.com>
	<4A97F090.9080508@redhat.com>	<29ae894c0908281045l6c93c7dbo2d0e4f27c5bab14e@mail.gmail.com>	<29ae894c0908310056u68c1272dud8babe5ac3542f9d@mail.gmail.com>	<4A9CC6D2.1000408@redhat.com>
	<29ae894c0909010053g15ef0c30y513d5501b776bbf2@mail.gmail.com>
Message-ID: <4A9D1A3C.60907@redhat.com>

On 01/09/09 08:53, brem belguebli wrote:
> Hello Chrissie,
> I couldn't find the item in the doc (the CMAN FAQ).
> Brem
>
>

It's here:

http://sources.redhat.com/cluster/wiki/MultiHome

Chrissie



From kkovachev at varna.net  Tue Sep  1 12:58:28 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Tue, 1 Sep 2009 15:58:28 +0300
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
In-Reply-To: <3128ba140909010548k5d112bd4t4be25ba2deccffee@mail.gmail.com>
References: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
	<8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>
	<3128ba140909010405q387b858bk2d73d723f1ecca79@mail.gmail.com>
	<8a5668960909010420u1ae1d0d8g46023d3e86c246e2@mail.gmail.com>
	<3128ba140909010521x3dc2771dn3aa627c62612ffa@mail.gmail.com>
	<20090901123347.M11629@varna.net>
	<3128ba140909010548k5d112bd4t4be25ba2deccffee@mail.gmail.com>
Message-ID: <20090901125151.M56559@varna.net>

On Tue, 1 Sep 2009 14:48:13 +0200, ESGLinux wrote 
> 2009/9/1 Kaloyan Kovachev <kkovachev at varna.net> 
>  On Tue, 1 Sep 2009 14:21:47 +0200, ESGLinux wrote 
> 
> > 
> > 
> > 
> > 
> > 
> > You should use one iscsi lun shared by both cluster nodes. You can mount a 
> GFS filesystem without locking (lock=nolock) with (correct me if I am wrong) 
> the node not being part of a cluster, but only in one node at a time. 
> > You can mount a GFS filesystem created for a certain cluster without having 
> the filesystem configured as a resource, the only requisite is that the nodes 
> mounting the filesystem have to be part of that certain cluster. 
> > 
> > 
> > If I have understand you ok, I need to create a cluster, for example, 
> MYCLUSTER, then create a resource of type GFS filesystem. After that I must 
> create 2 nodes in the cluster, access de iscsi lun from this nodes and finally 
> mount the gfs filesystem. 
> > 
> > With these I can share this directory between the nodes without the risk of 
> file corruption? 
> > 
> > Well, in the case I can?t use this approach, is there any way to do this? 
> > 
> 
> if you don't have shared storage, but you have local disks - you may use DRBD 
> instead of iSCSI. 
> 
> this looks interesting, any good manual about using DRBD? 
> 

There is a good documentation at http://www.drbd.org/ search for
primary-primary mode and make sure the replication channels is the same as for
the cluster communication to avoid split-brain and data corruption

>  About the cluster - you don't need to define any resources - 
> just have a cluster which is quorate to avoid data corruption while accessing 
> the GFS on DRBD 
> 
>  
> 
> ok, so I only need the cluster with the 2 nodes and the gfs filesystem
formated, for example like this: 
> 
> gfs_mkfs -p lock_dlm -t MyCLUSTER:mydata -j 8 /dev/sda1 
> When I have done this I can mount /dev/sda1 in both nodes as use it 
> [UTF-8?]isn??t it? 

you should format and mount /dev/drbd0 which is made on top of /dev/sda1, not
/dev/sda1 itself

> Thanks, 
> ESG 
> 
>  
> > Thanks for your time, 
> > 
> > ESG 
> > 
> > 
> > 
> > Regards, 
> > Juanra 
> > 
> > 
> > 
> 
> -- 
> Linux-cluster mailing list 
> Linux-cluster at redhat.com 
> https://www.redhat.com/mailman/listinfo/linux-cluster 
>



From esggrupos at gmail.com  Tue Sep  1 13:02:07 2009
From: esggrupos at gmail.com (ESGLinux)
Date: Tue, 1 Sep 2009 15:02:07 +0200
Subject: [Linux-cluster] SEMI OT. Synchronizing jboss cache dir.
In-Reply-To: <20090901125151.M56559@varna.net>
References: <3128ba140909010338q4bd4c805t27f5c29791970d13@mail.gmail.com>
	<8a5668960909010359h7ff92fbaj7ba700bc1c742c4c@mail.gmail.com>
	<3128ba140909010405q387b858bk2d73d723f1ecca79@mail.gmail.com>
	<8a5668960909010420u1ae1d0d8g46023d3e86c246e2@mail.gmail.com>
	<3128ba140909010521x3dc2771dn3aa627c62612ffa@mail.gmail.com>
	<20090901123347.M11629@varna.net>
	<3128ba140909010548k5d112bd4t4be25ba2deccffee@mail.gmail.com>
	<20090901125151.M56559@varna.net>
Message-ID: <3128ba140909010602s798134b6s327791ba456fd62c@mail.gmail.com>

>
>
>
> There is a good documentation at http://www.drbd.org/ search for
> primary-primary mode and make sure the replication channels is the same as
> for
> the cluster communication to avoid split-brain and data corruption
>

I?ll check it, thanks

>
> >  About the cluster - you don't need to define any resources -
> > just have a cluster which is quorate to avoid data corruption while
> accessing
> > the GFS on DRBD
> >
> >
> >
> > ok, so I only need the cluster with the 2 nodes and the gfs filesystem
> formated, for example like this:
> >
> > gfs_mkfs -p lock_dlm -t MyCLUSTER:mydata -j 8 /dev/sda1
> > When I have done this I can mount /dev/sda1 in both nodes as use it
> > [UTF-8?]isn??t it?
>
> you should format and mount /dev/drbd0 which is made on top of /dev/sda1,
> not
> /dev/sda1 itself
>

for now this kind of device drbd0 is totally strange for me ;-).

I?m going to read about drbd and I suposse I?ll finally understand it,

Thanks again,

ESG


>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/1534fd94/attachment.htm>

From tageorgiou at gmail.com  Tue Sep  1 13:21:34 2009
From: tageorgiou at gmail.com (Thomas Georgiou)
Date: Tue, 1 Sep 2009 09:21:34 -0400
Subject: [Linux-cluster] Problem with Pacemaker and Corosync
In-Reply-To: <b80f82d20908312333q1db2956dnae162d8970c73cec@mail.gmail.com>
References: <6e4c20e70908310612o120933cema2609513f13be78c@mail.gmail.com>
	<b80f82d20908312333q1db2956dnae162d8970c73cec@mail.gmail.com>
Message-ID: <6e4c20e70909010621s6341695et5785634db84324f3@mail.gmail.com>

Attached are the logs with debug enabled.
Here is corosync.conf:

#compatibility: none

aisexec {
        user:   root
        group:  root
}

totem {
    version: 2
    secauth: off
    threads: 0

    token: 1000
    join: 60
    consenus: 4800
    vsftype: none
    max_messages: 20
    clear:node_high_bit: yes

    interface {
        ringnumber: 0
        bindnetaddr: 198.38.17.40
        mcastaddr: 226.94.1.1
        mcastport: 5405
    }
}

service {
        name: pacemaker
        ver: 0
}

logging {
    fileline: off
    to_stderr: yes
    to_syslog: yes
    to_file: yes
    logfile: /var/log/corosync.log
    debug: on
    timestamp: on
}

amf {
    mode: disabled
}


On Tue, Sep 1, 2009 at 2:33 AM, Andrew Beekhof<andrew at beekhof.net> wrote:
> try turning on debug, there's nothing in the logs that indicate why
> the lrmd is having a problem
>
> On Mon, Aug 31, 2009 at 3:12 PM, Thomas Georgiou<tageorgiou at gmail.com> wrote:
>> Hi,
>>
>> I have installed Pacemaker 1.0.5, Corosync 1.0.0, and Openais 1.0.1
>> from source according to the Clusterlabs docs. ?However, when I go to
>> start corosync/pacemaker, I get error messages pertaining to lrm and
>> cibadmin -Q hangs and complains that the remote node is not available.
>> ?Attached is the corosync log.
>>
>> Any ideas?
>>
>> Thomas Georgiou
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.log
Type: application/octet-stream
Size: 76914 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/08e4e388/attachment.obj>

From brem.belguebli at gmail.com  Tue Sep  1 13:51:35 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 1 Sep 2009 15:51:35 +0200
Subject: [Linux-cluster] Use alternate network interfaces for heartbeat in
	RHCS
In-Reply-To: <4A9D1A3C.60907@redhat.com>
References: <29ae894c0908280710w3f999b0as1438451bf5869a8e@mail.gmail.com>
	<4A97E67A.1030506@redhat.com>
	<29ae894c0908280724p4fc8cbe4g6943a3138f278c1b@mail.gmail.com>
	<4A97EF74.6090904@redhat.com> <4A97F090.9080508@redhat.com>
	<29ae894c0908281045l6c93c7dbo2d0e4f27c5bab14e@mail.gmail.com>
	<29ae894c0908310056u68c1272dud8babe5ac3542f9d@mail.gmail.com>
	<4A9CC6D2.1000408@redhat.com>
	<29ae894c0909010053g15ef0c30y513d5501b776bbf2@mail.gmail.com>
	<4A9D1A3C.60907@redhat.com>
Message-ID: <29ae894c0909010651y1fd4688cyc52726572d65cf81@mail.gmail.com>

Thanks

Will it be supported in the future ?


2009/9/1, Christine Caulfield <ccaulfie at redhat.com>:
>
> On 01/09/09 08:53, brem belguebli wrote:
>
>> Hello Chrissie,
>> I couldn't find the item in the doc (the CMAN FAQ).
>> Brem
>>
>>
>>
> It's here:
>
> http://sources.redhat.com/cluster/wiki/MultiHome
>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090901/d306310d/attachment.htm>

From ccaulfie at redhat.com  Tue Sep  1 14:00:16 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 01 Sep 2009 15:00:16 +0100
Subject: [Linux-cluster] Use alternate network interfaces for heartbeat
	in	RHCS
In-Reply-To: <29ae894c0909010651y1fd4688cyc52726572d65cf81@mail.gmail.com>
References: <29ae894c0908280710w3f999b0as1438451bf5869a8e@mail.gmail.com>	<4A97E67A.1030506@redhat.com>	<29ae894c0908280724p4fc8cbe4g6943a3138f278c1b@mail.gmail.com>	<4A97EF74.6090904@redhat.com>
	<4A97F090.9080508@redhat.com>	<29ae894c0908281045l6c93c7dbo2d0e4f27c5bab14e@mail.gmail.com>	<29ae894c0908310056u68c1272dud8babe5ac3542f9d@mail.gmail.com>	<4A9CC6D2.1000408@redhat.com>	<29ae894c0909010053g15ef0c30y513d5501b776bbf2@mail.gmail.com>	<4A9D1A3C.60907@redhat.com>
	<29ae894c0909010651y1fd4688cyc52726572d65cf81@mail.gmail.com>
Message-ID: <4A9D28F0.70301@redhat.com>

On 01/09/09 14:51, brem belguebli wrote:
> Thanks
> Will it be supported in the future ?


Yes it will. But I can't be sure about just when "the future" is in this 
case, sorry!

Chrissie

> 2009/9/1, Christine Caulfield <ccaulfie at redhat.com
> <mailto:ccaulfie at redhat.com>>:
>
>     On 01/09/09 08:53, brem belguebli wrote:
>
>         Hello Chrissie,
>         I couldn't find the item in the doc (the CMAN FAQ).
>         Brem
>
>
>
>     It's here:
>
>     http://sources.redhat.com/cluster/wiki/MultiHome
>
>     Chrissie
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From carlopmart at gmail.com  Tue Sep  1 16:41:30 2009
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 01 Sep 2009 18:41:30 +0200
Subject: [Linux-cluster] fence vmware for vsphere esxi 4
Message-ID: <4A9D4EBA.60907@gmail.com>

Hi all,

  When will be possible to use fence_vmware or fence_vmware_ng with 
vsphere esxi 4?? Maybe on RHEL/CentOS 5.4??

  Thanks.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From pradhanparas at gmail.com  Tue Sep  1 17:21:50 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 1 Sep 2009 12:21:50 -0500
Subject: [Linux-cluster] Book
Message-ID: <8b711df40909011021p7d06155ch15b5e083fee1c8a3@mail.gmail.com>

Is there any book that covers virtualization using Xen and clustering
using Red hat Cluster suite in a single book that covers running a HA
cluster for virtual machines ?

Thanks
Paras.



From lhh at redhat.com  Tue Sep  1 18:41:40 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 01 Sep 2009 14:41:40 -0400
Subject: [Linux-cluster] NFS client failover
In-Reply-To: <4A94F87C.7030708@lists.grepular.com>
References: <4A93C8CE.7010202@lists.grepular.com>
	<a9f464b80908251214q33650614p52e449d62ee30d7f@mail.gmail.com>
	<4A943E31.3080209@lists.grepular.com>
	<4A94F87C.7030708@lists.grepular.com>
Message-ID: <1251830500.3209.544.camel@localhost.localdomain>

On Wed, 2009-08-26 at 09:55 +0100, Mike Cardwell wrote:
> On 25/08/2009 20:40, Mike Cardwell wrote:
> 
> > I figured that failover would happen more smoothly if the client was
> > aware of and in control of what was going on. If the IP suddenly moves
> > to another NFS server I don't know how the NFS client will cope with that.
> 
> Well, it seems to cope quite well. The nfs mount "hangs" for a few 
> seconds whilst the IP moves from one server to another (unavoidable 
> obviously), but it then picks up from where it was. I suspect there will 
> be file corruption issues with files that are partially written when the 
> failover happens, but I guess that can't be avoided without a client 
> side solution.

I don't think we've had reports of corruption in the past.

When using TCP, the client can hang for a very long time before
recovering; using UDP seems to resolve this.

-- Lon



From lhh at redhat.com  Tue Sep  1 19:19:41 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 01 Sep 2009 15:19:41 -0400
Subject: [Linux-cluster] 3 node cluster and quorum disk?
In-Reply-To: <20090826161128.1e32721c@pc-jsosic.srce.hr>
References: <20090826161128.1e32721c@pc-jsosic.srce.hr>
Message-ID: <1251832781.3209.548.camel@localhost.localdomain>

On Wed, 2009-08-26 at 16:11 +0200, Jakov Sosic wrote:
> Hi.
> 
> I have a situation - when two nodes are up in 3 node cluster, and one
> node goes down, cluster looses quorate - although I'm using qdiskd...


> 	<!-- Token -->
> 	<totem token="55000"/>
> 
> 	<!-- Quorum Disk -->
> 	<quorumd interval="5" tko="5" votes="2"
> 	label="SAS-qdisk" status_file="/tmp/qdisk"/>

<cman quorum_dev_poll="55"/>

If that doesn't fix it entirely, get rid of status_file, decrease
interval, and increase tko.  Try:

interval=2 tko=12 ?

-- Lon




From jfriesse at redhat.com  Wed Sep  2 07:30:48 2009
From: jfriesse at redhat.com (Jan Friesse)
Date: Wed, 02 Sep 2009 09:30:48 +0200
Subject: [Linux-cluster] fence vmware for vsphere esxi 4
In-Reply-To: <4A9D4EBA.60907@gmail.com>
References: <4A9D4EBA.60907@gmail.com>
Message-ID: <4A9E1F28.3030506@redhat.com>

Hi,
I'm pretty sure, that old fence_vmware will don't work on ESXi, because
ESXi (at least ESXi 3.5) doesn't have support for ssh in DOM-0, and we
are using it.

Fence_vmware_ng should work correctly (because using VI Perl API), but
it's not officially supported. In case it doesn't work, please let me know.

Regards,
  Honza

carlopmart wrote:
> Hi all,
> 
>  When will be possible to use fence_vmware or fence_vmware_ng with
> vsphere esxi 4?? Maybe on RHEL/CentOS 5.4??
> 
>  Thanks.
> 



From jakov.sosic at srce.hr  Wed Sep  2 09:47:09 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Wed, 2 Sep 2009 11:47:09 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <1251803481.12201.10.camel@marc>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3FEE.3000309@wol.de>
	<20090901112648.3e32c88d@nb-jsosic>
	<1251800976.10463.43.camel@marc>
	<20090901124851.7daf2c75@pc-jsosic.srce.hr>
	<1251803481.12201.10.camel@marc>
Message-ID: <20090902114709.66f6c5f3@nb-jsosic>

On Tue, 01 Sep 2009 13:11:21 +0200
"Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
wrote:

> Its actually the "APC Switched Rack PDUs" that you should look after.
> You can get an 8 port device for a small budget...

Is this it:

http://www.apc.com/products/family/index.cfm?id=70

It's still too expensive - AP7920 is around 800-900$ in my country... I
was hoping to get two for that price.




-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |



From mad at wol.de  Wed Sep  2 10:15:27 2009
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Wed, 02 Sep 2009 12:15:27 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <20090902114709.66f6c5f3@nb-jsosic>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3FEE.3000309@wol.de>
	<20090901112648.3e32c88d@nb-jsosic> <1251800976.10463.43.camel@marc>
	<20090901124851.7daf2c75@pc-jsosic.srce.hr>
	<1251803481.12201.10.camel@marc>  <20090902114709.66f6c5f3@nb-jsosic>
Message-ID: <1251886527.10505.9.camel@marc>

Am Mittwoch, den 02.09.2009, 11:47 +0200 schrieb Jakov Sosic:
> On Tue, 01 Sep 2009 13:11:21 +0200
> "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
> wrote:
> 
> > Its actually the "APC Switched Rack PDUs" that you should look after.
> > You can get an 8 port device for a small budget...
> 
> Is this it:
> 
> http://www.apc.com/products/family/index.cfm?id=70
> 
> It's still too expensive - AP7920 is around 800-900$ in my country... I
> was hoping to get two for that price.
> 

That's the one, the street price here is around 350? per device.



From corey.kovacs at gmail.com  Wed Sep  2 10:33:51 2009
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Wed, 2 Sep 2009 06:33:51 -0400
Subject: [Linux-cluster] dealing with oom-killer....
Message-ID: <7d6e8da40909020333h405f3f82w928f65b44afffc51@mail.gmail.com>

A colleague has a 5 node cluster with 4GB ram in each node. It's not enough
for the cluster and more ram is on the way. The problem though is that until
the ram arrives, there is risk of oom-killer (which he found out the other
day) firing up and putting the node into a state which made it utterly
useless but still looked good to the cluster. We could of course disable
oom-killer but that's a workaround, not a fix.

I am wondering if the cluster responding to oom-killer firing up and fencing
the offending node is possible and if so, how others might have done it.
Seems like it should just be handled by the cluster tho. Maybe have cman put
a message across the openais "bus" like, "Hey, losing my brain here, someone
whak me"...


Thanks


Corey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090902/d0615f01/attachment.htm>

From ccaulfie at redhat.com  Wed Sep  2 10:47:39 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Wed, 02 Sep 2009 11:47:39 +0100
Subject: [Linux-cluster] dealing with oom-killer....
In-Reply-To: <7d6e8da40909020333h405f3f82w928f65b44afffc51@mail.gmail.com>
References: <7d6e8da40909020333h405f3f82w928f65b44afffc51@mail.gmail.com>
Message-ID: <4A9E4D4B.5000003@redhat.com>

On 02/09/09 11:33, Corey Kovacs wrote:
> A colleague has a 5 node cluster with 4GB ram in each node. It's not
> enough for the cluster and more ram is on the way. The problem though is
> that until the ram arrives, there is risk of oom-killer (which he found
> out the other day) firing up and putting the node into a state which
> made it utterly useless but still looked good to the cluster. We could
> of course disable oom-killer but that's a workaround, not a fix.
>
> I am wondering if the cluster responding to oom-killer firing up and
> fencing the offending node is possible and if so, how others might have
> done it. Seems like it should just be handled by the cluster tho. Maybe
> have cman put a message across the openais "bus" like, "Hey, losing my
> brain here, someone whak me"...
>

I suppose you could give cman a large value for /proc/<pid>/oom_score so 
that it is the first thing to be killed if the system runs out of 
memory. That should guarantee that it will be fenced by the other nodes 
... provided they have enough memory to remain quorate!

Chrissie



From brem.belguebli at gmail.com  Wed Sep  2 11:14:04 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 2 Sep 2009 13:14:04 +0200
Subject: [Linux-cluster] Re: Fencing question in geo cluster (dual sites
	clustering)
In-Reply-To: <29ae894c0908210227r85df80fm173af6452d22a5b2@mail.gmail.com>
References: <29ae894c0908210227r85df80fm173af6452d22a5b2@mail.gmail.com>
Message-ID: <29ae894c0909020414s518a5530n538ca3f5e21377f1@mail.gmail.com>

Hi,

Any idea or comment on this.

Thanks

Brem
<http://1.bp.blogspot.com/_mz9iIrpv_qo/Si1NmQ2QNmI/AAAAAAAADP4/fV8j_ZsGlBw/s1600-h/Drawing1.png>


CF link attached to diagram that describes the setup.
http://1.bp.blogspot.com/_mz9iIrpv_qo/Si1NmQ2QNmI/AAAAAAAADP4/fV8j_ZsGlBw/s1600-h/Drawing1.png

2009/8/21, brem belguebli <brem.belguebli at gmail.com>:
>
> Hi,
>
> I'm trying to find out what best fencing solution could fit a dual sites
> cluster.
>
> Cluster is equally sized on each site (2 nodes/site), each site hosting a
> SAN array so that each node from any site can see the 2 arrays.
>
> Quorum  disk (iscsi LUN) is hosted on a 3rd site.
>
> SAN and LAN using the same telco infrastructure (2 redundant DWDM loops).
>
> In case something happens at Telco level (both DWDM loops are broken) that
> makes 1 of the 2 sites completely isolated from the rest of the world,
> the nodes at the good site (the one still operationnal) won't be able to
> fence any node from the wrong site (the one that is isolated) as there is no
> way for them to reach their ILO's or do any SAN fencing as the switches at
> the wrong site are no more reachable.
>
> As qdiskd is not reachable from the wrong nodes, they end up being rebooted
> by  qdisk, but there is a short time (a few seconds) during which the wrong
> nodes are still seing their local SAN array storage and may potentially have
> written data on it.
>
> Any ideas or comments on how to ensure data integrity in such setup ?
>
> Regards
>
> Brem
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090902/2fecfcda/attachment.htm>

From ccaulfie at redhat.com  Wed Sep  2 11:57:54 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Wed, 02 Sep 2009 12:57:54 +0100
Subject: [Linux-cluster] dealing with oom-killer....
In-Reply-To: <7d6e8da40909020333h405f3f82w928f65b44afffc51@mail.gmail.com>
References: <7d6e8da40909020333h405f3f82w928f65b44afffc51@mail.gmail.com>
Message-ID: <4A9E5DC2.4030107@redhat.com>

On 02/09/09 11:33, Corey Kovacs wrote:
> A colleague has a 5 node cluster with 4GB ram in each node. It's not
> enough for the cluster and more ram is on the way. The problem though is
> that until the ram arrives, there is risk of oom-killer (which he found
> out the other day) firing up and putting the node into a state which
> made it utterly useless but still looked good to the cluster. We could
> of course disable oom-killer but that's a workaround, not a fix.
>
> I am wondering if the cluster responding to oom-killer firing up and
> fencing the offending node is possible and if so, how others might have
> done it. Seems like it should just be handled by the cluster tho. Maybe
> have cman put a message across the openais "bus" like, "Hey, losing my
> brain here, someone whak me"...
>

I suppose you could give cman a large value for /proc/<pid>/oom_score so 
that it is the first thing to be killed if the system runs out of 
memory. That should guarantee that it will be fenced by the other nodes 
... provided they have enough memory to remain quorate!

Chrissie



From jakov.sosic at srce.hr  Wed Sep  2 15:59:38 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Wed, 2 Sep 2009 17:59:38 +0200
Subject: [Linux-cluster] How to disable node?
In-Reply-To: <1251886527.10505.9.camel@marc>
References: <20090831223053.5461ad55@nb-jsosic>
	<fac531740908311330q6827548dlf1966605f3ff4e10@mail.gmail.com>
	<20090831225918.33c892eb@nb-jsosic> <4A9C3FEE.3000309@wol.de>
	<20090901112648.3e32c88d@nb-jsosic>
	<1251800976.10463.43.camel@marc>
	<20090901124851.7daf2c75@pc-jsosic.srce.hr>
	<1251803481.12201.10.camel@marc>
	<20090902114709.66f6c5f3@nb-jsosic> <1251886527.10505.9.camel@marc>
Message-ID: <20090902175938.3d186aa5@pc-jsosic.srce.hr>

On Wed, 02 Sep 2009 12:15:27 +0200
"Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
wrote:

> That's the one, the street price here is around 350? per device.

Well, then it's time for me to call some of german emigrants :)



-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |



From alfredo.moralejo at roche.com  Wed Sep  2 16:48:50 2009
From: alfredo.moralejo at roche.com (Moralejo, Alfredo)
Date: Wed, 2 Sep 2009 18:48:50 +0200
Subject: [Linux-cluster] Re: Fencing question in geo cluster (dual sites
	clustering)
In-Reply-To: <29ae894c0909020414s518a5530n538ca3f5e21377f1@mail.gmail.com>
References: <29ae894c0908210227r85df80fm173af6452d22a5b2@mail.gmail.com>
	<29ae894c0909020414s518a5530n538ca3f5e21377f1@mail.gmail.com>
Message-ID: <C64734E4E1C80E49955AD539DB2FBC3A0251D28B@rkamsem703.emea.roche.com>

What kind of data replication will be used?

Regards,

Alfredo

________________________________
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of brem belguebli
Sent: Wednesday, September 02, 2009 1:14 PM
To: linux clustering
Subject: [Linux-cluster] Re: Fencing question in geo cluster (dual sites clustering)

Hi,

Any idea or comment on this.

Thanks

Brem
<http://1.bp.blogspot.com/_mz9iIrpv_qo/Si1NmQ2QNmI/AAAAAAAADP4/fV8j_ZsGlBw/s1600-h/Drawing1.png>

CF link attached to diagram that describes the setup.
http://1.bp.blogspot.com/_mz9iIrpv_qo/Si1NmQ2QNmI/AAAAAAAADP4/fV8j_ZsGlBw/s1600-h/Drawing1.png

2009/8/21, brem belguebli <brem.belguebli at gmail.com<mailto:brem.belguebli at gmail.com>>:
Hi,

I'm trying to find out what best fencing solution could fit a dual sites cluster.

Cluster is equally sized on each site (2 nodes/site), each site hosting a SAN array so that each node from any site can see the 2 arrays.

Quorum  disk (iscsi LUN) is hosted on a 3rd site.

SAN and LAN using the same telco infrastructure (2 redundant DWDM loops).

In case something happens at Telco level (both DWDM loops are broken) that makes 1 of the 2 sites completely isolated from the rest of the world,
the nodes at the good site (the one still operationnal) won't be able to fence any node from the wrong site (the one that is isolated) as there is no way for them to reach their ILO's or do any SAN fencing as the switches at the wrong site are no more reachable.

As qdiskd is not reachable from the wrong nodes, they end up being rebooted by  qdisk, but there is a short time (a few seconds) during which the wrong nodes are still seing their local SAN array storage and may potentially have written data on it.

Any ideas or comments on how to ensure data integrity in such setup ?

Regards

Brem

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090902/79b7c886/attachment.htm>

From brem.belguebli at gmail.com  Wed Sep  2 18:11:23 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 2 Sep 2009 20:11:23 +0200
Subject: [Linux-cluster] Re: Fencing question in geo cluster (dual sites 
	clustering)
In-Reply-To: <C64734E4E1C80E49955AD539DB2FBC3A0251D28B@rkamsem703.emea.roche.com>
References: <29ae894c0908210227r85df80fm173af6452d22a5b2@mail.gmail.com>
	<29ae894c0909020414s518a5530n538ca3f5e21377f1@mail.gmail.com>
	<C64734E4E1C80E49955AD539DB2FBC3A0251D28B@rkamsem703.emea.roche.com>
Message-ID: <29ae894c0909021111q6ebf0113k97a7107f2e5c416b@mail.gmail.com>

Hi Alfredo,
For the moment, it is a POC, and  I'm basing the whole thing on the RAID1
mdadm resource script I have submitted.

I'm also considering the possibility of using a Continuous Access (HP arrays
like EMC's SRDF functionnality) but still need raid manager binaries etc ...
and the time and inspiration to write the scripts.

Ideally, I would tend to privilege LVM mirror, but it still has some points
to be addressed as SPOF on mirrorlog etc...

Brem

2009/9/2 Moralejo, Alfredo <alfredo.moralejo at roche.com>

>  What kind of data replication will be used?
>
>
>
> Regards,
>
>
>
> Alfredo
>
>
>  ------------------------------
>
> *From:* linux-cluster-bounces at redhat.com [mailto:
> linux-cluster-bounces at redhat.com] *On Behalf Of *brem belguebli
> *Sent:* Wednesday, September 02, 2009 1:14 PM
> *To:* linux clustering
> *Subject:* [Linux-cluster] Re: Fencing question in geo cluster (dual sites
> clustering)
>
>
>
> Hi,
>
>
>
> Any idea or comment on this.
>
>
>
> Thanks
>
>
>
> Brem
>
>
> <http://1.bp.blogspot.com/_mz9iIrpv_qo/Si1NmQ2QNmI/AAAAAAAADP4/fV8j_ZsGlBw/s1600-h/Drawing1.png>
>
>
>
>
> CF link attached to diagram that describes the setup.
>
> http://1.bp.blogspot.com/_mz9iIrpv_qo/Si1NmQ2QNmI/AAAAAAAADP4/fV8j_ZsGlBw/s1600-h/Drawing1.png
>
>
> 2009/8/21, brem belguebli <brem.belguebli at gmail.com>:
>
> Hi,
>
>
>
> I'm trying to find out what best fencing solution could fit a dual sites
> cluster.
>
>
>
> Cluster is equally sized on each site (2 nodes/site), each site hosting a
> SAN array so that each node from any site can see the 2 arrays.
>
>
>
> Quorum  disk (iscsi LUN) is hosted on a 3rd site.
>
>
>
> SAN and LAN using the same telco infrastructure (2 redundant DWDM loops).
>
>
>
> In case something happens at Telco level (both DWDM loops are broken) that
> makes 1 of the 2 sites completely isolated from the rest of the world,
>
> the nodes at the good site (the one still operationnal) won't be able to
> fence any node from the wrong site (the one that is isolated) as there is no
> way for them to reach their ILO's or do any SAN fencing as the switches at
> the wrong site are no more reachable.
>
>
>
> As qdiskd is not reachable from the wrong nodes, they end up being rebooted
> by  qdisk, but there is a short time (a few seconds) during which the wrong
> nodes are still seing their local SAN array storage and may potentially have
> written data on it.
>
>
>
> Any ideas or comments on how to ensure data integrity in such setup ?
>
>
>
> Regards
>
>
>
> Brem
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090902/c01b0039/attachment.htm>

From plebeuz at ig.com.br  Thu Sep  3 14:06:47 2009
From: plebeuz at ig.com.br (Daniel Viana Auler(Plebeuz))
Date: Thu, 03 Sep 2009 11:06:47 -0300
Subject: [Linux-cluster] GFS - Cluster
Message-ID: <4A9FCD77.1020806@ig.com.br>

Hello,
          People, can i use gfs without a storage? I want to use a local 
device and then make a cluster in 2 other machines to use gfs for test.

Att,

Plebeuz
-- 



From bmr at redhat.com  Thu Sep  3 14:24:16 2009
From: bmr at redhat.com (Bryn M. Reeves)
Date: Thu, 03 Sep 2009 15:24:16 +0100
Subject: [Linux-cluster] GFS - Cluster
In-Reply-To: <4A9FCD77.1020806@ig.com.br>
References: <4A9FCD77.1020806@ig.com.br>
Message-ID: <1251987856.25346.168.camel@breeves.fab.redhat.com>

On Thu, 2009-09-03 at 11:06 -0300, Daniel Viana Auler(Plebeuz) wrote:
> Hello,
>           People, can i use gfs without a storage? I want to use a local 
> device and then make a cluster in 2 other machines to use gfs for test.

Checkout the software iscsi target or the gnbd package.

Regards,
Bryn.




From gordan at bobich.net  Thu Sep  3 14:53:07 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Thu, 3 Sep 2009 15:53:07 +0100
Subject: [Linux-cluster] GFS - Cluster
Message-ID: <4A4B4FF016C5FEC3@> (added by '')

Or DRBD.

-----Original Message-----
From: "Bryn M. Reeves" <bmr at redhat.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: 03/09/09 15:24
Subject: Re: [Linux-cluster] GFS - Cluster

On Thu, 2009-09-03 at 11:06 -0300, Daniel Viana Auler(Plebeuz) wrote:
> Hello,
>           People, can i use gfs without a storage? I want to use a local 
> device and then make a cluster in 2 other machines to use gfs for test.

Checkout the software iscsi target or the gnbd package.

Regards,
Bryn.


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




From maciej.grela at nsn.com  Fri Sep  4 07:50:31 2009
From: maciej.grela at nsn.com (Maciej Grela)
Date: Fri, 04 Sep 2009 09:49:31 +0159
Subject: [Linux-cluster] GFS - Cluster
In-Reply-To: <4A9FCD77.1020806@ig.com.br>
References: <4A9FCD77.1020806@ig.com.br>
Message-ID: <4AA0C6A3.2050607@nsn.com>

ext Daniel Viana Auler(Plebeuz) pisze:
> Hello,
>          People, can i use gfs without a storage? I want to use a
> local device and then make a cluster in 2 other machines to use gfs
> for test.
>
> Att,
>
> Plebeuz

You could use nbd to export the blockdevice to the second node. In case
of gfs you need some way for *both* the nodes to see the same block device.
Haven't tried the nbd approach myself though.

Best regards,
Maciej Grela



From Alain.Moulle at bull.net  Fri Sep  4 09:46:38 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Fri, 04 Sep 2009 11:46:38 +0200
Subject: [Linux-cluster] Question about "ccs_tool update"
Message-ID: <4AA0E1FE.70903@bull.net>

Hi,

With this release : cman-3.0.2-1.fc11.x86_64
it seems that we can't do ccs_tool update anymore :

  ccs_tool update /etc/cluster/cluster.conf
  Unknown command, update.
  Try 'ccs_tool help' for help.

and effectively the help does not list anymore options update (neither 
upgrade).

Therefore, what is the new way to make it dynamically update the 
configuration ?

(in former releases, we used to do ccs_tool update ... and then 
cman_tool version -r ...)

Thanks for your help
Alain



From fdinitto at redhat.com  Fri Sep  4 10:33:40 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 04 Sep 2009 12:33:40 +0200
Subject: [Linux-cluster] Re: [Cluster-devel] Can't manage virtual servers
	after upgrade
In-Reply-To: <E963DDE784034556A79E2965CA70312B@bangard>
References: <E963DDE784034556A79E2965CA70312B@bangard>
Message-ID: <1252060420.6387.0.camel@cerberus.int.fabbione.net>

hi,

in future please use linux-cluster at redhat.com mailing list or file a
bugzilla.

cluster-devel is meant for development only topics.

Thanks
Fabio

On Fri, 2009-09-04 at 14:26 +0400, Alexander wrote:
> Hello!
> 
> I have upgrade 3 servers in cluster to RHEL 5.4 and now i can't start virtual machine service via luci. In /var/log/messages i see errors:
> 
> Sep  4 11:42:55 hwcl-n1 clurgmgrd[5374]: <notice> start on vm "vps-nagios" returned 1 (generic error)
> Sep  4 11:42:55 hwcl-n1 clurgmgrd[5374]: <warning> #68: Failed to start vm:vps-nagios; return value: 1
> Sep  4 11:42:55 hwcl-n1 clurgmgrd[5374]: <notice> Stopping service vm:vps-nagios
> 
> After upgrade, via luci web-interface i can't add new virtual machine service. Looks like cluster soft don't know, that server booted with xen kernel and xen is started.
> When i boot servers with kernel without xen and install KVM hypervisor, then luci can create new service for virtual machines, but i need use xen hypervisor.
> 
> Can anybody help - where is problem with xen hypervisor? Probably, some rpm packet is missing? (but i update server via command "yum update").
> 
> Thank You.
> 
> With best regards, Alexander.



From fdinitto at redhat.com  Fri Sep  4 10:34:19 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 04 Sep 2009 12:34:19 +0200
Subject: [Linux-cluster] Re: [Cluster-devel] luci SSL error:
	SSL_ERROR_ZERO_RETURN
In-Reply-To: <1251964843.31750.17.camel@leodolter.obvsg.at>
References: <1251964843.31750.17.camel@leodolter.obvsg.at>
Message-ID: <1252060459.6387.2.camel@cerberus.int.fabbione.net>

hi,

in future please use linux-cluster at redhat.com mailing list or file a
bugzilla.

cluster-devel is meant for development only topics.

Thanks
Fabio

On Thu, 2009-09-03 at 10:00 +0200, Ulrich Leodolter wrote:
> Hello,
> 
> luci is unable to get ssl certs from ricci.
> i have setup luci/ricci as described in redhat manual.
> 
> i tried this on RHEL5.3 x86_64 and today after upgrade
> to RHEL5.4 x86_64.
> 
> there is no problem on RHEL5.3 i386 machine,
> looks like it is a x86_64 ssl problem????
> 
> 
> after click on "View SSL cert fingerprints" is see this message:
> 
> The following errors occurred:
> 
> Error reading from myhost.mydomain:11111: SSL error: SSL_ERROR_ZERO_RETURN
> 
> 
> syslog messages:
> 
> Sep  3 11:14:47 myhost luci: Luci startup succeeded
> Sep  3 11:14:47 myhost luci: Listening on port 8084; accessible via URL https://myhost.mydomain:8084
> Sep  3 11:16:25 myhost luci[7987]: Error reading from myhost.mydomain:11111: SSL error: SSL_ERROR_ZERO_RETURN
> Sep  3 11:16:33 myhost luci[7987]: Error reading from myhost.mydomain:11111: SSL error: SSL_ERROR_ZERO_RETURN
> Sep  3 11:16:33 myhost luci[7987]: Unable to establish an SSL connection to myhost.mydomain:11111: unable to open temp file
> 
> 
> 
> 
> any tips ????
> thx
> ulrich
> 
> 
> 
> 



From fdinitto at redhat.com  Fri Sep  4 10:42:08 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 04 Sep 2009 12:42:08 +0200
Subject: [Linux-cluster] Question about "ccs_tool update"
In-Reply-To: <4AA0E1FE.70903@bull.net>
References: <4AA0E1FE.70903@bull.net>
Message-ID: <1252060928.6387.8.camel@cerberus.int.fabbione.net>

On Fri, 2009-09-04 at 11:46 +0200, Alain.Moulle wrote:
> Hi,
> 
> With this release : cman-3.0.2-1.fc11.x86_64
> it seems that we can't do ccs_tool update anymore :
> 
>   ccs_tool update /etc/cluster/cluster.conf
>   Unknown command, update.
>   Try 'ccs_tool help' for help.
> 
> and effectively the help does not list anymore options update (neither 
> upgrade).
> 
> Therefore, what is the new way to make it dynamically update the 
> configuration ?

The configuration distribution across nodes is now delegate to
luci/ricci via ccs_sync command. The old ccsd ccs_tool bits are gone.

Assuming your configuration is identical on all nodes you can issue, on
one node only, cman_tool version -r $newversion.

$newversion is either 0 (autodetect the version from cluster.conf and
check that is newer/higher than the runtime config) or the exact version
you want to load.

Note that we are still working on smoothing a few corners in the new
configuration system and that a bad config could be problematic for the
cluster.

Fabio



From Alain.Moulle at bull.net  Fri Sep  4 10:51:56 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Fri, 04 Sep 2009 12:51:56 +0200
Subject: [Linux-cluster] Question about "ccs_tool update"
In-Reply-To: <1252060928.6387.8.camel@cerberus.int.fabbione.net>
References: <4AA0E1FE.70903@bull.net>
	<1252060928.6387.8.camel@cerberus.int.fabbione.net>
Message-ID: <4AA0F14C.2050104@bull.net>

Hi Fabio,
and many thanks. But just another precision :
you mean that ccs_sync is making the job
now , in a hidden way when cman_tool -r version is
executed , right ?
but does the fact that cluster.conf is in another place
than /etc/cluster matter for ccs_sync to work fine ?
because I just tried :
[root at oberon3 ~]# ccs_sync help
Unable to parse /etc/cluster/cluster.conf: No such file or directory
Does that mean that ccs_sync does not take in account the
/etc/sysconfig/cman file ?

Thanks again
Alain


Fabio M. Di Nitto a ?crit :
> On Fri, 2009-09-04 at 11:46 +0200, Alain.Moulle wrote:
>   
>> Hi,
>>
>> With this release : cman-3.0.2-1.fc11.x86_64
>> it seems that we can't do ccs_tool update anymore :
>>
>>   ccs_tool update /etc/cluster/cluster.conf
>>   Unknown command, update.
>>   Try 'ccs_tool help' for help.
>>
>> and effectively the help does not list anymore options update (neither 
>> upgrade).
>>
>> Therefore, what is the new way to make it dynamically update the 
>> configuration ?
>>     
>
> The configuration distribution across nodes is now delegate to
> luci/ricci via ccs_sync command. The old ccsd ccs_tool bits are gone.
>
> Assuming your configuration is identical on all nodes you can issue, on
> one node only, cman_tool version -r $newversion.
>
> $newversion is either 0 (autodetect the version from cluster.conf and
> check that is newer/higher than the runtime config) or the exact version
> you want to load.
>
> Note that we are still working on smoothing a few corners in the new
> configuration system and that a bad config could be problematic for the
> cluster.
>
> Fabio
>
>
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090904/9f21f5e9/attachment.htm>

From ntadmin at fi.upm.es  Fri Sep  4 12:08:11 2009
From: ntadmin at fi.upm.es (Miguel Sanchez)
Date: Fri, 04 Sep 2009 14:08:11 +0200
Subject: [Linux-cluster] How backup domU partition from dom0?
Message-ID: <4AA1032B.2090306@fi.upm.es>

Hi. I have two hosts forming a cluster for run xen vm's. Each vm has a 
disk which is corresponding to a clvm logic volume within dom0.
I pretended to backup the lv's from dom0 doing a 'kpart -a' and a 
readonly mount (with the vm running). Probably it is not very correct 
(but the alternative snapshot was not possible with clvm).
Most of times, the operation is ok, and the backup finishes with 
problems, but in other cases 'mount -r /dev/mapper/device /path' does 
not return and it stays consuming time indefinitely. I cannot kill the 
process y have to fence the node.

How could I make the domU backups within dom0 without these problems?


Thanks.
Miguel.



From fajar at fajar.net  Fri Sep  4 13:15:17 2009
From: fajar at fajar.net (Fajar A. Nugraha)
Date: Fri, 4 Sep 2009 20:15:17 +0700
Subject: [Linux-cluster] How backup domU partition from dom0?
In-Reply-To: <4AA1032B.2090306@fi.upm.es>
References: <4AA1032B.2090306@fi.upm.es>
Message-ID: <7207d96f0909040615r26a36041p7ed978c73ad43753@mail.gmail.com>

On Fri, Sep 4, 2009 at 7:08 PM, Miguel Sanchez<ntadmin at fi.upm.es> wrote:
> Hi. I have two hosts forming a cluster for run xen vm's. Each vm has a disk
> which is corresponding to a clvm logic volume within dom0.
> I pretended to backup the lv's from dom0 doing a 'kpart -a' and a readonly
> mount (with the vm running). Probably it is not very correct (but the
> alternative snapshot was not possible with clvm).

Did you know that if you mount an ext3 partition READ ONLY you could
actually do a WRITE to that partition to replay the journal, and so
cause possible data corruption?

> Most of times, the operation is ok, and the backup finishes with problems,
> but in other cases 'mount -r /dev/mapper/device /path' does not return and
> it stays consuming time indefinitely. I cannot kill the process y have to
> fence the node.
>
> How could I make the domU backups within dom0 without these problems?

You can't. Not without clvm snapshot.
What you could probably do :
- do backup from within domU
- do backup from the SAN, if it supports snapshot.

-- 
Fajar



From fdinitto at redhat.com  Fri Sep  4 13:47:11 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 04 Sep 2009 15:47:11 +0200
Subject: [Linux-cluster] Question about "ccs_tool update"
In-Reply-To: <4AA0F14C.2050104@bull.net>
References: <4AA0E1FE.70903@bull.net>
	<1252060928.6387.8.camel@cerberus.int.fabbione.net>
	<4AA0F14C.2050104@bull.net>
Message-ID: <1252072031.6387.14.camel@cerberus.int.fabbione.net>

On Fri, 2009-09-04 at 12:51 +0200, Alain.Moulle wrote:
> Hi Fabio,
> and many thanks. But just another precision :
> you mean that ccs_sync is making the job
> now , in a hidden way when cman_tool -r version is
> executed , right ?

No, cman_tool doesn't invoke ccs_sync.

> but does the fact that cluster.conf is in another place
> than /etc/cluster matter for ccs_sync to work fine ? 


> because I just tried :
> [root at oberon3 ~]# ccs_sync help
> Unable to parse /etc/cluster/cluster.conf: No such file or directory
> Does that mean that ccs_sync does not take in account the
> /etc/sysconfig/cman file ?

I have CC'ed Ryan that wrote ccs_sync. I really have no idea as I do scp
manually my cluster.conf around.

Fabio

PS Pretty please, can you stop sending html colored messages? It's
really hard to read black on blue.



From rmccabe at redhat.com  Fri Sep  4 15:16:36 2009
From: rmccabe at redhat.com (Ryan McCabe)
Date: Fri, 4 Sep 2009 11:16:36 -0400
Subject: [Linux-cluster] Question about "ccs_tool update"
In-Reply-To: <1252072031.6387.14.camel@cerberus.int.fabbione.net>
References: <4AA0E1FE.70903@bull.net>
	<1252060928.6387.8.camel@cerberus.int.fabbione.net>
	<4AA0F14C.2050104@bull.net>
	<1252072031.6387.14.camel@cerberus.int.fabbione.net>
Message-ID: <20090904151636.GB30811@redhat.com>

On Fri, Sep 04, 2009 at 03:47:11PM +0200, Fabio M. Di Nitto wrote:
> > because I just tried :
> > [root at oberon3 ~]# ccs_sync help
> > Unable to parse /etc/cluster/cluster.conf: No such file or directory
> > Does that mean that ccs_sync does not take in account the
> > /etc/sysconfig/cman file ?
> 
> I have CC'ed Ryan that wrote ccs_sync. I really have no idea as I do scp
> manually my cluster.conf around.

Hi,

It doesn't take /etc/sysconfig/cman into account currently. Could you
please open a bz ticket about this, and I'll try to get it fixed as soon
as possible.


Thanks,

Ryan



From fdinitto at redhat.com  Fri Sep  4 15:22:22 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 04 Sep 2009 17:22:22 +0200
Subject: [Linux-cluster] Question about "ccs_tool update"
In-Reply-To: <20090904151636.GB30811@redhat.com>
References: <4AA0E1FE.70903@bull.net>
	<1252060928.6387.8.camel@cerberus.int.fabbione.net>
	<4AA0F14C.2050104@bull.net>
	<1252072031.6387.14.camel@cerberus.int.fabbione.net>
	<20090904151636.GB30811@redhat.com>
Message-ID: <1252077742.6387.18.camel@cerberus.int.fabbione.net>

On Fri, 2009-09-04 at 11:16 -0400, Ryan McCabe wrote:
> On Fri, Sep 04, 2009 at 03:47:11PM +0200, Fabio M. Di Nitto wrote:
> > > because I just tried :
> > > [root at oberon3 ~]# ccs_sync help
> > > Unable to parse /etc/cluster/cluster.conf: No such file or directory
> > > Does that mean that ccs_sync does not take in account the
> > > /etc/sysconfig/cman file ?
> > 
> > I have CC'ed Ryan that wrote ccs_sync. I really have no idea as I do scp
> > manually my cluster.conf around.
> 
> Hi,
> 
> It doesn't take /etc/sysconfig/cman into account currently. Could you
> please open a bz ticket about this, and I'll try to get it fixed as soon
> as possible.


Ryan, it needs to take into account COROSYNC_CLUSTER_CONFIG_FILE env var
either from the running environment or loaded via
either /etc/sysconfig/cluster or /etc/sysconfig/cman for rpm based
distros and /etc/default/cluster or /etc/default/cman on deb based
distros.

cman is always preferred over cluster.

Fabio



From ntadmin at fi.upm.es  Fri Sep  4 19:42:24 2009
From: ntadmin at fi.upm.es (Miguel Sanchez)
Date: Fri, 04 Sep 2009 21:42:24 +0200
Subject: [Linux-cluster] How backup domU partition from dom0?
In-Reply-To: <7207d96f0909040615r26a36041p7ed978c73ad43753@mail.gmail.com>
References: <4AA1032B.2090306@fi.upm.es>
	<7207d96f0909040615r26a36041p7ed978c73ad43753@mail.gmail.com>
Message-ID: <4AA16DA0.6030602@fi.upm.es>

Fajar A. Nugraha escribi?:
> On Fri, Sep 4, 2009 at 7:08 PM, Miguel Sanchez<ntadmin at fi.upm.es> wrote:
>   
>> Hi. I have two hosts forming a cluster for run xen vm's. Each vm has a disk
>> which is corresponding to a clvm logic volume within dom0.
>> I pretended to backup the lv's from dom0 doing a 'kpart -a' and a readonly
>> mount (with the vm running). Probably it is not very correct (but the
>> alternative snapshot was not possible with clvm).
>>     
>
> Did you know that if you mount an ext3 partition READ ONLY you could
> actually do a WRITE to that partition to replay the journal, and so
> cause possible data corruption?
>   
No, I don't. Then could the partition within dom0 be defined readonly with 'blockdev --setro' and avoid any write, direcly in the data as well as possible replaying the journal?

--
Miguel.




From fajar at fajar.net  Sat Sep  5 02:04:16 2009
From: fajar at fajar.net (Fajar A. Nugraha)
Date: Sat, 5 Sep 2009 09:04:16 +0700
Subject: [Linux-cluster] How backup domU partition from dom0?
In-Reply-To: <4AA16DA0.6030602@fi.upm.es>
References: <4AA1032B.2090306@fi.upm.es>
	<7207d96f0909040615r26a36041p7ed978c73ad43753@mail.gmail.com>
	<4AA16DA0.6030602@fi.upm.es>
Message-ID: <7207d96f0909041904r6b1a16a0n1664d8941ee8e2d1@mail.gmail.com>

On Sat, Sep 5, 2009 at 2:42 AM, Miguel Sanchez<ntadmin at fi.upm.es> wrote:
> Fajar A. Nugraha escribi?:

>>
>> Did you know that if you mount an ext3 partition READ ONLY you could
>> actually do a WRITE to that partition to replay the journal, and so
>> cause possible data corruption?
>>
>
> No, I don't. Then could the partition within dom0 be defined readonly with
> 'blockdev --setro' and avoid any write, direcly in the data as well as
> possible replaying the journal?

AFAIK if you do that you won't be able to replay the journal, and
kernel will refuse to mount it :P

-- 
Fajar



From Luis.Cerezo at pgs.com  Tue Sep  8 20:40:36 2009
From: Luis.Cerezo at pgs.com (Luis Cerezo)
Date: Tue, 8 Sep 2009 15:40:36 -0500
Subject: [Linux-cluster] 3 node cluster and quorum disk?
In-Reply-To: <1251832781.3209.548.camel@localhost.localdomain>
References: <20090826161128.1e32721c@pc-jsosic.srce.hr>
	<1251832781.3209.548.camel@localhost.localdomain>
Message-ID: <41AFBA96-80A5-4342-9A80-5178FF7E0C1A@pgs.com>

how many votes do the other nodes have?


Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 1, 2009, at 2:19 PM, Lon Hohberger wrote:

> On Wed, 2009-08-26 at 16:11 +0200, Jakov Sosic wrote:
>> Hi.
>>
>> I have a situation - when two nodes are up in 3 node cluster, and one
>> node goes down, cluster looses quorate - although I'm using qdiskd...
>
>
>> 	<!-- Token -->
>> 	<totem token="55000"/>
>>
>> 	<!-- Quorum Disk -->
>> 	<quorumd interval="5" tko="5" votes="2"
>> 	label="SAS-qdisk" status_file="/tmp/qdisk"/>
>
> <cman quorum_dev_poll="55"/>
>
> If that doesn't fix it entirely, get rid of status_file, decrease
> interval, and increase tko.  Try:
>
> interval=2 tko=12 ?
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.



From pradhanparas at gmail.com  Tue Sep  8 20:57:00 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 8 Sep 2009 15:57:00 -0500
Subject: [Linux-cluster] 3 node cluster and quorum disk?
In-Reply-To: <20090826161128.1e32721c@pc-jsosic.srce.hr>
References: <20090826161128.1e32721c@pc-jsosic.srce.hr>
Message-ID: <8b711df40909081357p35d14345kaff6199b742efb76@mail.gmail.com>

On Wed, Aug 26, 2009 at 9:11 AM, Jakov Sosic<jakov.sosic at srce.hr> wrote:
> Hi.
>
> I have a situation - when two nodes are up in 3 node cluster, and one
> node goes down, cluster looses quorate - although I'm using qdiskd...
>
> I think that problem is in switching qdisk master from one node to
> another. In that case, rgmanager disables all running services, which is
> not acceptable situation. Services are currently set to
> autostart="0" because cluster is in evaluation phase.
>
> Here is my config:
>
> <?xml version="1.0"?>
> <cluster alias="cluster-c00" config_version="56" name="cluster-c00">
> ? ? ? ?<fence_daemon post_fail_delay="0" post_join_delay="120"/>
> ? ? ? ?<!-- DEFINE CLUSTER NODES, AND FENCE DEVICES ?-->
> ? ? ? ?<clusternodes>
> ? ? ? ? ? ? ? ?<clusternode name="node01" nodeid="1" votes="1">
> ? ? ? ? ? ? ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<device name="node01-ipmi"/>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ? ? ? ? ? ? ?</fence>
> ? ? ? ? ? ? ? ?</clusternode>
> ? ? ? ? ? ? ? ?<clusternode name="node02" nodeid="2" votes="1">
> ? ? ? ? ? ? ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<device name="node02-ipmi"/>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ? ? ? ? ? ? ?</fence>
> ? ? ? ? ? ? ? ?</clusternode>
> ? ? ? ? ? ? ? ?<clusternode name="node03" nodeid="3" votes="1">
> ? ? ? ? ? ? ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<device name="node03-ipmi"/>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ? ? ? ? ? ? ?</fence>
> ? ? ? ? ? ? ? ?</clusternode>
> ? ? ? ?</clusternodes>
>
> ? ? ? ?<!-- DEFINE CLUSTER MANAGER BEHAVIOUR -->
> ? ? ? ?<cman expected_votes="3" deadnode_timeout="80"/>
> <!-- ? ? ? ? ? ?<multicast addr="224.0.0.1"/> ? </cman> -->
>
> ? ? ? ?<!-- Token -->
> ? ? ? ?<totem token="55000"/>
>
> ? ? ? ?<!-- Quorum Disk -->
> ? ? ? ?<quorumd interval="5" tko="5" votes="2"
> ? ? ? ?label="SAS-qdisk" status_file="/tmp/qdisk"/>
>
> ? ? ? ?<!-- DEFINE FENCE DEVICES -->
> ? ? ? ?<fencedevices>
> ? ? ? ? ? ? ? ?<fencedevice agent="fence_ipmilan" auth="password"
> ? ? ? ?ipaddr="" login="" passwd="" name="node01-ipmi"/>
> ? ? ? ? ? ? ? ?<fencedevice agent="fence_ipmilan" auth="password"
> ? ? ? ?ipaddr="" login="" passwd="" name="node02-ipmi"/>
> ? ? ? ? ? ? ? ?<fencedevice agent="fence_ipmilan" auth="password"
> ? ? ? ?ipaddr="" login="" passwd="" name="node03-ipmi"/>
> ? ? ? ?</fencedevices>
>
> </cluster>
>
> Should I change any of the timeouts?
>
>
>
>
>
>
> --
> | ? ?Jakov Sosic ? ?| ? ?ICQ: 28410271 ? ?| ? PGP: 0x965CAE2D ? |
> =================================================================
> | start fighting cancer -> http://www.worldcommunitygrid.org/ ? |
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

I ran into the same problem. I am also running a 3 nodes cluster with
qdisk. Before, my node1 , node2  and nod3 has 1, 1, 2 votes and qdisk
had 2. I ran into the same problem as u are having now. Then I change
the votes from 2 to 1 to node 3 and added a vote to qdisk . Now it is
running fine. I don't know what happed before. I have tested a lot but
didnot succeeded.

Now my,
interval = 1 and tko=10 in my case.

Paras.



From alan.zg at gmail.com  Tue Sep  8 22:34:11 2009
From: alan.zg at gmail.com (Alan A)
Date: Tue, 8 Sep 2009 17:34:11 -0500
Subject: [Linux-cluster] Multicasting problems
Message-ID: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>

It has come to the point where our cluster production configuration has
halted due to the unexpected issues with multicasting on LAN/WAN.

The problem is that the firewall enabled on the switch ports does not
support multicasting, and between cluster nodes and the routers lays
firewall.

Nodes -> Switch with integrated Firewall devices -> Router

We are aware of problems encountered with Cisco switches and are trying to
clear some things. For instance in RHEL Knowledgebase article 5933 it
states:

*The recommended method is to enable multicast routing for a given vlan so
that the Catalyst will act as the IGMP querier. This consists of the
following steps:*

* *

   1.

   *Enabling multicast on the switch globally*
   2.

   *Choosing the vlan the cluster nodes are using*
   3.

   *Turning on PIM routing for that subnet*


My Questions:

Can we enable PIM routing on the Server NIC itself without using dedicated
network device? Meaning IGMP multicast would be managed by the NIC's itself
from each node, can the nodes awarnes function this way?

Any suggestions on how to get around firewall issue without purchesing
firewalls with routing tables?

Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090908/26dad2b4/attachment.htm>

From sdake at redhat.com  Tue Sep  8 22:38:55 2009
From: sdake at redhat.com (Steven Dake)
Date: Tue, 08 Sep 2009 15:38:55 -0700
Subject: [Linux-cluster] Multicasting problems
In-Reply-To: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
References: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
Message-ID: <1252449535.18865.7.camel@localhost.localdomain>

On Tue, 2009-09-08 at 17:34 -0500, Alan A wrote:
> It has come to the point where our cluster production configuration
> has halted due to the unexpected issues with multicasting on LAN/WAN.
> 
> The problem is that the firewall enabled on the switch ports does not
> support multicasting, and between cluster nodes and the routers lays
> firewall.
> 
> Nodes -> Switch with integrated Firewall devices -> Router
> 
> We are aware of problems encountered with Cisco switches and are
> trying to clear some things. For instance in RHEL Knowledgebase
> article 5933 it states:
> 
> 
> The recommended method is to enable multicast routing for a given vlan
> so that the Catalyst will act as the IGMP querier. This consists of
> the following steps:
> 
>  
> 
>      1. Enabling multicast on the switch globally
>         
>      2. Choosing the vlan the cluster nodes are using
>         
>      3. Turning on PIM routing for that subnet
>         
> 
> My Questions:
> 
> Can we enable PIM routing on the Server NIC itself without using
> dedicated network device? Meaning IGMP multicast would be managed by
> the NIC's itself from each node, can the nodes awarnes function this
> way?
> 
> Any suggestions on how to get around firewall issue without purchesing
> firewalls with routing tables?
> 
> Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.
> 

I'm afraid only Cisco (and maybe some Cisco experts on this list) knows
the answers to your questions.  I suggest you contact your Cisco TAC for
advice on configuring their products.  They can help you achieve best
results.

Regards
-steve


> 
> 
> -- 
> Alan A.
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From alan.zg at gmail.com  Wed Sep  9 03:30:17 2009
From: alan.zg at gmail.com (Alan A)
Date: Tue, 8 Sep 2009 22:30:17 -0500
Subject: [Linux-cluster] Multicasting problems
In-Reply-To: <1252449535.18865.7.camel@localhost.localdomain>
References: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
	<1252449535.18865.7.camel@localhost.localdomain>
Message-ID: <fac531740909082030u7d844dafhe822ea5a5c9daf90@mail.gmail.com>

Thank you for your input. We are contacting Cisco to get their input on
this, but we have to explore RH options if any as well. Would there be  a
way to enable NIC (network device on the server) and make it IGMP aware
somehow. In essence how can I make NIC's manage IGMP and PIM, is there a
way? I know I can make a NIC in Linux become a router, but how do I make it
IGMP and PIM aware on each node?

On Tue, Sep 8, 2009 at 5:38 PM, Steven Dake <sdake at redhat.com> wrote:

> On Tue, 2009-09-08 at 17:34 -0500, Alan A wrote:
> > It has come to the point where our cluster production configuration
> > has halted due to the unexpected issues with multicasting on LAN/WAN.
> >
> > The problem is that the firewall enabled on the switch ports does not
> > support multicasting, and between cluster nodes and the routers lays
> > firewall.
> >
> > Nodes -> Switch with integrated Firewall devices -> Router
> >
> > We are aware of problems encountered with Cisco switches and are
> > trying to clear some things. For instance in RHEL Knowledgebase
> > article 5933 it states:
> >
> >
> > The recommended method is to enable multicast routing for a given vlan
> > so that the Catalyst will act as the IGMP querier. This consists of
> > the following steps:
> >
> >
> >
> >      1. Enabling multicast on the switch globally
> >
> >      2. Choosing the vlan the cluster nodes are using
> >
> >      3. Turning on PIM routing for that subnet
> >
> >
> > My Questions:
> >
> > Can we enable PIM routing on the Server NIC itself without using
> > dedicated network device? Meaning IGMP multicast would be managed by
> > the NIC's itself from each node, can the nodes awarnes function this
> > way?
> >
> > Any suggestions on how to get around firewall issue without purchesing
> > firewalls with routing tables?
> >
> > Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.
> >
>
> I'm afraid only Cisco (and maybe some Cisco experts on this list) knows
> the answers to your questions.  I suggest you contact your Cisco TAC for
> advice on configuring their products.  They can help you achieve best
> results.
>
> Regards
> -steve
>
>
> >
> >
> > --
> > Alan A.
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090908/16e41dc6/attachment.htm>

From jakov.sosic at srce.hr  Wed Sep  9 09:08:02 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Wed, 9 Sep 2009 11:08:02 +0200
Subject: [Linux-cluster] Multicasting problems
In-Reply-To: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
References: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
Message-ID: <20090909110802.5a8c5113@nb-jsosic>

On Tue, 8 Sep 2009 17:34:11 -0500
Alan A <alan.zg at gmail.com> wrote:

> It has come to the point where our cluster production configuration
> has halted due to the unexpected issues with multicasting on LAN/WAN.
> 
> The problem is that the firewall enabled on the switch ports does not
> support multicasting, and between cluster nodes and the routers lays
> firewall.
> 
> Nodes -> Switch with integrated Firewall devices -> Router
> 
> We are aware of problems encountered with Cisco switches and are
> trying to clear some things. For instance in RHEL Knowledgebase
> article 5933 it states:
> 
> *The recommended method is to enable multicast routing for a given
> vlan so that the Catalyst will act as the IGMP querier. This consists
> of the following steps:*
> 
> * *
> 
>    1.
> 
>    *Enabling multicast on the switch globally*
>    2.
> 
>    *Choosing the vlan the cluster nodes are using*
>    3.
> 
>    *Turning on PIM routing for that subnet*
> 
> 
> My Questions:
> 
> Can we enable PIM routing on the Server NIC itself without using
> dedicated network device? Meaning IGMP multicast would be managed by
> the NIC's itself from each node, can the nodes awarnes function this
> way?
> 
> Any suggestions on how to get around firewall issue without purchesing
> firewalls with routing tables?
> 
> Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.

It seems that I was right with my diagnostics :D


Why don't you create VLAN with private subnet addresses, in for example
10.0.0.0/8, and allow PIM on that VLAN, and trunk it with regular
wlan that you use now. And then configure RHCS to heartbeat over
this new private VLAN with enabled PIM? You wouldn't need the firewall
because the VLAN would be used only for cluster communication, and it
could be fully isolated. It does not need to be routed at all - because
heartbeat packages go only between nodes. So no external access to that
VLAN would be enabled. It's perfectly safe.

If you need help on configuring either Cisco 6500 or RHEL for VLAN
trunking please ask. Take a look at 802.1Q standard to understand the
issue:

http://en.wikipedia.org/wiki/IEEE_802.1Q



-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |



From alan.zg at gmail.com  Wed Sep  9 12:32:57 2009
From: alan.zg at gmail.com (Alan A)
Date: Wed, 9 Sep 2009 07:32:57 -0500
Subject: [Linux-cluster] Multicasting problems
In-Reply-To: <20090909110802.5a8c5113@nb-jsosic>
References: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
	<20090909110802.5a8c5113@nb-jsosic>
Message-ID: <fac531740909090532h1da80068x60c6a3931f1c2804@mail.gmail.com>

The problem lays in creating the VLAN that allows PIM. Firewall and the
switch are one physical device, and once the firewall is on it manages
directly ports on the switch, and firewall is not capable (according to our
LAN /WAN engineers) at least not on this Cisco model of managing or allowing
PIM. For PIM we need other dedicated device that would handle Sparse/Dense
mode before the firewall, which is a major problem. That is why I am
interested in what can be done on the server side, what options can we
enable on the NIC's directly to mimic PIM. Switch will allow IGMPv2
communication, but in our tests without Router like device with PIM enabled,
we were unable to form the cluster. Each node woud send IGMP messages and it
would be totally unaware of other nodes sending their messages.

On Wed, Sep 9, 2009 at 4:08 AM, Jakov Sosic <jakov.sosic at srce.hr> wrote:

> On Tue, 8 Sep 2009 17:34:11 -0500
> Alan A <alan.zg at gmail.com> wrote:
>
> > It has come to the point where our cluster production configuration
> > has halted due to the unexpected issues with multicasting on LAN/WAN.
> >
> > The problem is that the firewall enabled on the switch ports does not
> > support multicasting, and between cluster nodes and the routers lays
> > firewall.
> >
> > Nodes -> Switch with integrated Firewall devices -> Router
> >
> > We are aware of problems encountered with Cisco switches and are
> > trying to clear some things. For instance in RHEL Knowledgebase
> > article 5933 it states:
> >
> > *The recommended method is to enable multicast routing for a given
> > vlan so that the Catalyst will act as the IGMP querier. This consists
> > of the following steps:*
> >
> > * *
> >
> >    1.
> >
> >    *Enabling multicast on the switch globally*
> >    2.
> >
> >    *Choosing the vlan the cluster nodes are using*
> >    3.
> >
> >    *Turning on PIM routing for that subnet*
> >
> >
> > My Questions:
> >
> > Can we enable PIM routing on the Server NIC itself without using
> > dedicated network device? Meaning IGMP multicast would be managed by
> > the NIC's itself from each node, can the nodes awarnes function this
> > way?
> >
> > Any suggestions on how to get around firewall issue without purchesing
> > firewalls with routing tables?
> >
> > Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.
>
> It seems that I was right with my diagnostics :D
>
>
> Why don't you create VLAN with private subnet addresses, in for example
> 10.0.0.0/8, and allow PIM on that VLAN, and trunk it with regular
> wlan that you use now. And then configure RHCS to heartbeat over
> this new private VLAN with enabled PIM? You wouldn't need the firewall
> because the VLAN would be used only for cluster communication, and it
> could be fully isolated. It does not need to be routed at all - because
> heartbeat packages go only between nodes. So no external access to that
> VLAN would be enabled. It's perfectly safe.
>
> If you need help on configuring either Cisco 6500 or RHEL for VLAN
> trunking please ask. Take a look at 802.1Q standard to understand the
> issue:
>
> http://en.wikipedia.org/wiki/IEEE_802.1Q
>
>
>
> --
> |    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
> =================================================================
> | start fighting cancer -> http://www.worldcommunitygrid.org/   |
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090909/620420b4/attachment.htm>

From Luis.Cerezo at pgs.com  Wed Sep  9 18:22:19 2009
From: Luis.Cerezo at pgs.com (Luis Cerezo)
Date: Wed, 9 Sep 2009 13:22:19 -0500
Subject: [Linux-cluster] Multicasting problems
In-Reply-To: <fac531740909090532h1da80068x60c6a3931f1c2804@mail.gmail.com>
References: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
	<20090909110802.5a8c5113@nb-jsosic>
	<fac531740909090532h1da80068x60c6a3931f1c2804@mail.gmail.com>
Message-ID: <E6E2FEB8-945C-4074-9CAD-9F6FB880E92A@pgs.com>

this may be completely unhelpful...

have you tried changing the mcast address of the cluster?

Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 9, 2009, at 7:32 AM, Alan A wrote:

The problem lays in creating the VLAN that allows PIM. Firewall and the switch are one physical device, and once the firewall is on it manages directly ports on the switch, and firewall is not capable (according to our LAN /WAN engineers) at least not on this Cisco model of managing or allowing PIM. For PIM we need other dedicated device that would handle Sparse/Dense mode before the firewall, which is a major problem. That is why I am interested in what can be done on the server side, what options can we enable on the NIC's directly to mimic PIM. Switch will allow IGMPv2 communication, but in our tests without Router like device with PIM enabled, we were unable to form the cluster. Each node woud send IGMP messages and it would be totally unaware of other nodes sending their messages.

On Wed, Sep 9, 2009 at 4:08 AM, Jakov Sosic <jakov.sosic at srce.hr<mailto:jakov.sosic at srce.hr>> wrote:
On Tue, 8 Sep 2009 17:34:11 -0500
Alan A <alan.zg at gmail.com<mailto:alan.zg at gmail.com>> wrote:

> It has come to the point where our cluster production configuration
> has halted due to the unexpected issues with multicasting on LAN/WAN.
>
> The problem is that the firewall enabled on the switch ports does not
> support multicasting, and between cluster nodes and the routers lays
> firewall.
>
> Nodes -> Switch with integrated Firewall devices -> Router
>
> We are aware of problems encountered with Cisco switches and are
> trying to clear some things. For instance in RHEL Knowledgebase
> article 5933 it states:
>
> *The recommended method is to enable multicast routing for a given
> vlan so that the Catalyst will act as the IGMP querier. This consists
> of the following steps:*
>
> * *
>
>    1.
>
>    *Enabling multicast on the switch globally*
>    2.
>
>    *Choosing the vlan the cluster nodes are using*
>    3.
>
>    *Turning on PIM routing for that subnet*
>
>
> My Questions:
>
> Can we enable PIM routing on the Server NIC itself without using
> dedicated network device? Meaning IGMP multicast would be managed by
> the NIC's itself from each node, can the nodes awarnes function this
> way?
>
> Any suggestions on how to get around firewall issue without purchesing
> firewalls with routing tables?
>
> Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.

It seems that I was right with my diagnostics :D


Why don't you create VLAN with private subnet addresses, in for example
10.0.0.0/8<http://10.0.0.0/8>, and allow PIM on that VLAN, and trunk it with regular
wlan that you use now. And then configure RHCS to heartbeat over
this new private VLAN with enabled PIM? You wouldn't need the firewall
because the VLAN would be used only for cluster communication, and it
could be fully isolated. It does not need to be routed at all - because
heartbeat packages go only between nodes. So no external access to that
VLAN would be enabled. It's perfectly safe.

If you need help on configuring either Cisco 6500 or RHEL for VLAN
trunking please ask. Take a look at 802.1Q standard to understand the
issue:

http://en.wikipedia.org/wiki/IEEE_802.1Q



--
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |

--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Alan A.
--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.



From alan.zg at gmail.com  Wed Sep  9 18:38:03 2009
From: alan.zg at gmail.com (Alan A)
Date: Wed, 9 Sep 2009 13:38:03 -0500
Subject: [Linux-cluster] Multicasting problems
In-Reply-To: <E6E2FEB8-945C-4074-9CAD-9F6FB880E92A@pgs.com>
References: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
	<20090909110802.5a8c5113@nb-jsosic>
	<fac531740909090532h1da80068x60c6a3931f1c2804@mail.gmail.com>
	<E6E2FEB8-945C-4074-9CAD-9F6FB880E92A@pgs.com>
Message-ID: <fac531740909091138m1a5df5fer753b7b784f737aac@mail.gmail.com>

Haven't done that but I am not positive that it would help in our setting.
My tests were to establish private VLAN with 3 private addresses for 3 node
cluster. I hade node1 on 192.168.10.21, node2 192.168.10.22, and node3 on
192.168.10.23.
I could ping each node from each node, so node1 would see node2 and node3,
node2 would see node1 and node3, and node3 would see node1 and node2. I made
/etc/host entries and checked with the 'route' command that the device eth2
on each node was dedicated to access private network on 192.168.10.2x, as it
showed.
There was no additional network devise on the Cisco switch, just the 3
cluster nodes. I issued cman_tool status command and got the multicast
address - checked that it is the same on all three nodes and when I pinged
the address I just got the dead air..... Nothing...
I tried this by forcing cluster via sysclt command to use IGMPv1 v2 and
v3... None worked.

On Wed, Sep 9, 2009 at 1:22 PM, Luis Cerezo <Luis.Cerezo at pgs.com> wrote:

> this may be completely unhelpful...
>
> have you tried changing the mcast address of the cluster?
>
> Luis E. Cerezo
> Global IT
> GV: +1 412 223 7396
>
> On Sep 9, 2009, at 7:32 AM, Alan A wrote:
>
> The problem lays in creating the VLAN that allows PIM. Firewall and the
> switch are one physical device, and once the firewall is on it manages
> directly ports on the switch, and firewall is not capable (according to our
> LAN /WAN engineers) at least not on this Cisco model of managing or allowing
> PIM. For PIM we need other dedicated device that would handle Sparse/Dense
> mode before the firewall, which is a major problem. That is why I am
> interested in what can be done on the server side, what options can we
> enable on the NIC's directly to mimic PIM. Switch will allow IGMPv2
> communication, but in our tests without Router like device with PIM enabled,
> we were unable to form the cluster. Each node woud send IGMP messages and it
> would be totally unaware of other nodes sending their messages.
>
> On Wed, Sep 9, 2009 at 4:08 AM, Jakov Sosic <jakov.sosic at srce.hr<mailto:
> jakov.sosic at srce.hr>> wrote:
> On Tue, 8 Sep 2009 17:34:11 -0500
> Alan A <alan.zg at gmail.com<mailto:alan.zg at gmail.com>> wrote:
>
> > It has come to the point where our cluster production configuration
> > has halted due to the unexpected issues with multicasting on LAN/WAN.
> >
> > The problem is that the firewall enabled on the switch ports does not
> > support multicasting, and between cluster nodes and the routers lays
> > firewall.
> >
> > Nodes -> Switch with integrated Firewall devices -> Router
> >
> > We are aware of problems encountered with Cisco switches and are
> > trying to clear some things. For instance in RHEL Knowledgebase
> > article 5933 it states:
> >
> > *The recommended method is to enable multicast routing for a given
> > vlan so that the Catalyst will act as the IGMP querier. This consists
> > of the following steps:*
> >
> > * *
> >
> >    1.
> >
> >    *Enabling multicast on the switch globally*
> >    2.
> >
> >    *Choosing the vlan the cluster nodes are using*
> >    3.
> >
> >    *Turning on PIM routing for that subnet*
> >
> >
> > My Questions:
> >
> > Can we enable PIM routing on the Server NIC itself without using
> > dedicated network device? Meaning IGMP multicast would be managed by
> > the NIC's itself from each node, can the nodes awarnes function this
> > way?
> >
> > Any suggestions on how to get around firewall issue without purchesing
> > firewalls with routing tables?
> >
> > Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.
>
> It seems that I was right with my diagnostics :D
>
>
> Why don't you create VLAN with private subnet addresses, in for example
> 10.0.0.0/8<http://10.0.0.0/8>, and allow PIM on that VLAN, and trunk it
> with regular
> wlan that you use now. And then configure RHCS to heartbeat over
> this new private VLAN with enabled PIM? You wouldn't need the firewall
> because the VLAN would be used only for cluster communication, and it
> could be fully isolated. It does not need to be routed at all - because
> heartbeat packages go only between nodes. So no external access to that
> VLAN would be enabled. It's perfectly safe.
>
> If you need help on configuring either Cisco 6500 or RHEL for VLAN
> trunking please ask. Take a look at 802.1Q standard to understand the
> issue:
>
> http://en.wikipedia.org/wiki/IEEE_802.1Q
>
>
>
> --
> |    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
> =================================================================
> | start fighting cancer -> http://www.worldcommunitygrid.org/   |
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Alan A.
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> This e-mail, including any attachments and response string, may contain
> proprietary information which is confidential and may be legally privileged.
> It is for the intended recipient only. If you are not the intended recipient
> or transmission error has misdirected this e-mail, please notify the author
> by return e-mail and delete this message and any attachment immediately. If
> you are not the intended recipient you must not use, disclose, distribute,
> forward, copy, print or rely on this e-mail in any way except as permitted
> by the author.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090909/be735391/attachment.htm>

From Luis.Cerezo at pgs.com  Wed Sep  9 20:47:54 2009
From: Luis.Cerezo at pgs.com (Luis Cerezo)
Date: Wed, 9 Sep 2009 15:47:54 -0500
Subject: [Linux-cluster] Multicasting problems
In-Reply-To: <fac531740909091138m1a5df5fer753b7b784f737aac@mail.gmail.com>
References: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
	<20090909110802.5a8c5113@nb-jsosic>
	<fac531740909090532h1da80068x60c6a3931f1c2804@mail.gmail.com>
	<E6E2FEB8-945C-4074-9CAD-9F6FB880E92A@pgs.com>
	<fac531740909091138m1a5df5fer753b7b784f737aac@mail.gmail.com>
Message-ID: <04FEBDD4-54DA-4457-A68D-BD4BE379D023@pgs.com>

try adding something like

 <cman>
                <multicast addr="225.0.0.13"/>
        </cman>


to your cluster.conf (of course uptick the rev, ccs_tool update..)

-luis


Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 9, 2009, at 1:38 PM, Alan A wrote:

Haven't done that but I am not positive that it would help in our setting.
My tests were to establish private VLAN with 3 private addresses for 3 node cluster. I hade node1 on 192.168.10.21, node2 192.168.10.22, and node3 on 192.168.10.23.
I could ping each node from each node, so node1 would see node2 and node3, node2 would see node1 and node3, and node3 would see node1 and node2. I made /etc/host entries and checked with the 'route' command that the device eth2 on each node was dedicated to access private network on 192.168.10.2x, as it showed.
There was no additional network devise on the Cisco switch, just the 3 cluster nodes. I issued cman_tool status command and got the multicast address - checked that it is the same on all three nodes and when I pinged the address I just got the dead air..... Nothing...
I tried this by forcing cluster via sysclt command to use IGMPv1 v2 and v3... None worked.

On Wed, Sep 9, 2009 at 1:22 PM, Luis Cerezo <Luis.Cerezo at pgs.com<mailto:Luis.Cerezo at pgs.com>> wrote:
this may be completely unhelpful...

have you tried changing the mcast address of the cluster?

Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 9, 2009, at 7:32 AM, Alan A wrote:

The problem lays in creating the VLAN that allows PIM. Firewall and the switch are one physical device, and once the firewall is on it manages directly ports on the switch, and firewall is not capable (according to our LAN /WAN engineers) at least not on this Cisco model of managing or allowing PIM. For PIM we need other dedicated device that would handle Sparse/Dense mode before the firewall, which is a major problem. That is why I am interested in what can be done on the server side, what options can we enable on the NIC's directly to mimic PIM. Switch will allow IGMPv2 communication, but in our tests without Router like device with PIM enabled, we were unable to form the cluster. Each node woud send IGMP messages and it would be totally unaware of other nodes sending their messages.

On Wed, Sep 9, 2009 at 4:08 AM, Jakov Sosic <jakov.sosic at srce.hr<mailto:jakov.sosic at srce.hr><mailto:jakov.sosic at srce.hr<mailto:jakov.sosic at srce.hr>>> wrote:
On Tue, 8 Sep 2009 17:34:11 -0500
Alan A <alan.zg at gmail.com<mailto:alan.zg at gmail.com><mailto:alan.zg at gmail.com<mailto:alan.zg at gmail.com>>> wrote:

> It has come to the point where our cluster production configuration
> has halted due to the unexpected issues with multicasting on LAN/WAN.
>
> The problem is that the firewall enabled on the switch ports does not
> support multicasting, and between cluster nodes and the routers lays
> firewall.
>
> Nodes -> Switch with integrated Firewall devices -> Router
>
> We are aware of problems encountered with Cisco switches and are
> trying to clear some things. For instance in RHEL Knowledgebase
> article 5933 it states:
>
> *The recommended method is to enable multicast routing for a given
> vlan so that the Catalyst will act as the IGMP querier. This consists
> of the following steps:*
>
> * *
>
>    1.
>
>    *Enabling multicast on the switch globally*
>    2.
>
>    *Choosing the vlan the cluster nodes are using*
>    3.
>
>    *Turning on PIM routing for that subnet*
>
>
> My Questions:
>
> Can we enable PIM routing on the Server NIC itself without using
> dedicated network device? Meaning IGMP multicast would be managed by
> the NIC's itself from each node, can the nodes awarnes function this
> way?
>
> Any suggestions on how to get around firewall issue without purchesing
> firewalls with routing tables?
>
> Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.

It seems that I was right with my diagnostics :D


Why don't you create VLAN with private subnet addresses, in for example
10.0.0.0/8<http://10.0.0.0/8><http://10.0.0.0/8>, and allow PIM on that VLAN, and trunk it with regular
wlan that you use now. And then configure RHCS to heartbeat over
this new private VLAN with enabled PIM? You wouldn't need the firewall
because the VLAN would be used only for cluster communication, and it
could be fully isolated. It does not need to be routed at all - because
heartbeat packages go only between nodes. So no external access to that
VLAN would be enabled. It's perfectly safe.

If you need help on configuring either Cisco 6500 or RHEL for VLAN
trunking please ask. Take a look at 802.1Q standard to understand the
issue:

http://en.wikipedia.org/wiki/IEEE_802.1Q



--
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |

--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com><mailto:Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>>
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Alan A.
--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com><mailto:Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>>
https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.

--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Alan A.
--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.



From esggrupos at gmail.com  Thu Sep 10 10:51:38 2009
From: esggrupos at gmail.com (ESGLinux)
Date: Thu, 10 Sep 2009 12:51:38 +0200
Subject: [Linux-cluster] do I have a fence DRAC device?
In-Reply-To: <3128ba140908180535o4f62b011vc41e5ec6517ac388@mail.gmail.com>
References: <3128ba140908100324l6cdb4c34ra5f5edb39c6903e9@mail.gmail.com>
	<8b711df40908101134t69b8e12cof6cc551809421e45@mail.gmail.com>
	<3128ba140908170350ge619930w5c17368ff0d3cf42@mail.gmail.com>
	<8b711df40908171128j1fc18525nfd7df01d7604cda0@mail.gmail.com>
	<3128ba140908180535o4f62b011vc41e5ec6517ac388@mail.gmail.com>
Message-ID: <3128ba140909100351s544df2e5k8878324907ae09b5@mail.gmail.com>

Hi all,
after a long time without the opportunity to check the boot process of my
server to see the message I have done it.

I can see the following message:
BMC Revision 2.05
Remote Access Configuration Utility 1.25

I enter in the utility pressing F2. I have configured the ip to
192.168.1.250.


and now I can make ping to the ip but notning more.
 ping 192.168.1.250
PING 192.168.1.250 (192.168.1.250) 56(84) bytes of data.
64 bytes from 192.168.1.250: icmp_seq=1 ttl=128 time=60.3 ms

anybody knows what I need to do it to be able to manage the server?

Thanks

ESG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/09a63a8c/attachment.htm>

From robejrm at gmail.com  Thu Sep 10 11:04:48 2009
From: robejrm at gmail.com (Juan Ramon Martin Blanco)
Date: Thu, 10 Sep 2009 13:04:48 +0200
Subject: [Linux-cluster] do I have a fence DRAC device?
In-Reply-To: <3128ba140909100351s544df2e5k8878324907ae09b5@mail.gmail.com>
References: <3128ba140908100324l6cdb4c34ra5f5edb39c6903e9@mail.gmail.com>
	<8b711df40908101134t69b8e12cof6cc551809421e45@mail.gmail.com>
	<3128ba140908170350ge619930w5c17368ff0d3cf42@mail.gmail.com>
	<8b711df40908171128j1fc18525nfd7df01d7604cda0@mail.gmail.com>
	<3128ba140908180535o4f62b011vc41e5ec6517ac388@mail.gmail.com>
	<3128ba140909100351s544df2e5k8878324907ae09b5@mail.gmail.com>
Message-ID: <8a5668960909100404u3d86f7cbv6be51d4530527b23@mail.gmail.com>

On Thu, Sep 10, 2009 at 12:51 PM, ESGLinux <esggrupos at gmail.com> wrote:

> Hi all,
> after a long time without the opportunity to check the boot process of my
> server to see the message I have done it.
>
> I can see the following message:
> BMC Revision 2.05
> Remote Access Configuration Utility 1.25
>
> I enter in the utility pressing F2. I have configured the ip to
> 192.168.1.250.
>
>
> and now I can make ping to the ip but notning more.
>  ping 192.168.1.250
> PING 192.168.1.250 (192.168.1.250) 56(84) bytes of data.
> 64 bytes from 192.168.1.250: icmp_seq=1 ttl=128 time=60.3 ms
>
>
anybody knows what I need to do it to be able to manage the server?
>
> From another machine connect to the bmc ip using ipmitool utility
man ipmitool ;)

Greetings,
Juanra

> Thanks
>
> ESG
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/4069ca74/attachment.htm>

From gianluca.cecchi at gmail.com  Thu Sep 10 11:29:27 2009
From: gianluca.cecchi at gmail.com (Gianluca Cecchi)
Date: Thu, 10 Sep 2009 13:29:27 +0200
Subject: [Linux-cluster] where exactly cluster services are stoppped during
	shutdown?
Message-ID: <561c252c0909100429q7671ad0cj3880792a603a24a5@mail.gmail.com>

Hello,
suppose that I have a service srvname defined in chkconfig and I would like
to insert it as a resource/service in my cluster.conf
(version 3 of cluster as found in f11, but thanks for answer for version 2
as in rhel 5 if different)
So my cluster.conf is something like this:
                <resources>
                        <script file="/etc/init.d/srvname" name="SRV1"/>
                </resources>
                <service domain="SRV1" autostart="1" name="SRV1">
                <script ref="SRV1"/>
                </service>
To have the service to be managed only by cluster I have to do:

chkconfig --del srvname

SO now the question is: to understand correctly how to manage eventual
interactions with other init scripts, where and how exactly the service
srvname will be stopped when I run

shutdown -h now
or
shutdown -r now
?
Which one of the init script related is responsible to do a "srvname stop"
and when?

I presume rgmanager but I would like confirmation.

Is it correct to leave "as is" the init script in general if it is a
standard provided one or do I have to change it to be correctly managed as a
cluster script?

Thanks,
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/5b47351b/attachment.htm>

From esggrupos at gmail.com  Thu Sep 10 11:30:57 2009
From: esggrupos at gmail.com (ESGLinux)
Date: Thu, 10 Sep 2009 13:30:57 +0200
Subject: [Linux-cluster] do I have a fence DRAC device?
In-Reply-To: <8a5668960909100404u3d86f7cbv6be51d4530527b23@mail.gmail.com>
References: <3128ba140908100324l6cdb4c34ra5f5edb39c6903e9@mail.gmail.com>
	<8b711df40908101134t69b8e12cof6cc551809421e45@mail.gmail.com>
	<3128ba140908170350ge619930w5c17368ff0d3cf42@mail.gmail.com>
	<8b711df40908171128j1fc18525nfd7df01d7604cda0@mail.gmail.com>
	<3128ba140908180535o4f62b011vc41e5ec6517ac388@mail.gmail.com>
	<3128ba140909100351s544df2e5k8878324907ae09b5@mail.gmail.com>
	<8a5668960909100404u3d86f7cbv6be51d4530527b23@mail.gmail.com>
Message-ID: <3128ba140909100430u567dd5f6td1da7f487ce3b09c@mail.gmail.com>

Thank you Juanra,
you are right, I have executed this command from other machne:

ipmitool -U root -H 192.168.1.250 lan print
Password:
Set in Progress         : Set Complete
Auth Type Support       : NONE MD2 MD5 PASSWORD
Auth Type Enable        : Callback : MD2 MD5
                        : User     : MD2 MD5
                        : Operator : MD2 MD5
                        : Admin    : MD2 MD5
                        : OEM      : MD2 MD5
IP Address Source       : Static Address
IP Address              : 192.168.1.250
Subnet Mask             : 255.255.255.0
MAC Address             : 00:1e:c9:ae:6f:7e
SNMP Community String   : public
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP      : 192.168.1.1
Default Gateway MAC     : 00:00:00:00:00:00
Backup Gateway IP       : 0.0.0.0
Backup Gateway MAC      : 00:00:00:00:00:00
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
Cipher Suite Priv Max   : aaaaaaaaaaaaaaa
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

Now I suposse that if I run:

 ipmitool -U root -H 192.168.1.250 chassis power reset

I?ll be able to reboot the machine  (that is what I want from the begining).
I?ll look for a moment to do it and I?ll tell you what happens, so, stay
tunned ;-)

One last thing, I thought I?ll could reach the ip through a TCP port and I
did a port scanning and all ports are closed in this ip, so I don?t
understand very well how ipmitool works... Also I expect to have a web
access, isn?t there any web access to this kind of networkd cards? (I have
worked with HP iLO and it has a simple but usefull  one )

thanks again,

ESG.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/45fb4710/attachment.htm>

From esggrupos at gmail.com  Thu Sep 10 12:40:31 2009
From: esggrupos at gmail.com (ESGLinux)
Date: Thu, 10 Sep 2009 14:40:31 +0200
Subject: [Linux-cluster] do I have a fence DRAC device?
In-Reply-To: <3128ba140909100430u567dd5f6td1da7f487ce3b09c@mail.gmail.com>
References: <3128ba140908100324l6cdb4c34ra5f5edb39c6903e9@mail.gmail.com>
	<8b711df40908101134t69b8e12cof6cc551809421e45@mail.gmail.com>
	<3128ba140908170350ge619930w5c17368ff0d3cf42@mail.gmail.com>
	<8b711df40908171128j1fc18525nfd7df01d7604cda0@mail.gmail.com>
	<3128ba140908180535o4f62b011vc41e5ec6517ac388@mail.gmail.com>
	<3128ba140909100351s544df2e5k8878324907ae09b5@mail.gmail.com>
	<8a5668960909100404u3d86f7cbv6be51d4530527b23@mail.gmail.com>
	<3128ba140909100430u567dd5f6td1da7f487ce3b09c@mail.gmail.com>
Message-ID: <3128ba140909100540m4689f532n1c2b9e5e5c1586f7@mail.gmail.com>

Hello again,
I have initiated a X session on my server and I have realized that in the
root desktop there was a strage icon:

LaunchServerAdministrator.

I don?t recognice it but I have clicked on it (I know, a very bad idea, it
could be anything, but the curiosity killed the cat...)

And it opened firefox with a web access to the Dell OpenManage Server
Administrator.

Here is what  I was looking for during the last month!!!!

Thanks all of you for your help

ESG


>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/beca1922/attachment.htm>

From alan.zg at gmail.com  Thu Sep 10 15:12:57 2009
From: alan.zg at gmail.com (Alan A)
Date: Thu, 10 Sep 2009 10:12:57 -0500
Subject: [Linux-cluster] Re: Multicasting problems
In-Reply-To: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
References: <fac531740909081534p6f5e18e0l5634804b1c440fb5@mail.gmail.com>
Message-ID: <fac531740909100812ge87fe1cnd290737c06a8b12d@mail.gmail.com>

On Tue, Sep 8, 2009 at 5:34 PM, Alan A <alan.zg at gmail.com> wrote:

> It has come to the point where our cluster production configuration has
> halted due to the unexpected issues with multicasting on LAN/WAN.
>
> The problem is that the firewall enabled on the switch ports does not
> support multicasting, and between cluster nodes and the routers lays
> firewall.
>
> Nodes -> Switch with integrated Firewall devices -> Router
>
> We are aware of problems encountered with Cisco switches and are trying to
> clear some things. For instance in RHEL Knowledgebase article 5933 it
> states:
>
> *The recommended method is to enable multicast routing for a given vlan so
> that the Catalyst will act as the IGMP querier. This consists of the
> following steps:*
>
> * *
>
>    1.
>
>    *Enabling multicast on the switch globally*
>    2.
>
>    *Choosing the vlan the cluster nodes are using*
>    3.
>
>    *Turning on PIM routing for that subnet*
>
>
> My Questions:
>
> Can we enable PIM routing on the Server NIC itself without using dedicated
> network device? Meaning IGMP multicast would be managed by the NIC's itself
> from each node, can the nodes awarnes function this way?
>
> Any suggestions on how to get around firewall issue without purchesing
> firewalls with routing tables?
>
> Cisco switch model is: switch 6509 running 12.2(18) SXF and IGMP v2.
>
>
>
> --
> Alan A.
>

Problem is resolved - this is what we did after we found this article:

http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008059a9df.shtml



What we did is assign the multicast mac address to the switch ports X, Y, Z,
as described in that article under solution number 4:

Solution 4: Configure Static Multicast MAC Entries on All the Switches

You can make a static content-addressable memory (CAM) entry for the
multicast MAC address on all the switches for all the receiver ports and the
downstream switch ports. Any switch obeys the static CAM entry rules and
sends the packet out all the interfaces that are specified in the CAM table.
This is the least-scalable solution for an environment that has a lot of
multicast applications.

    Switch1(config)#mac-address-table static 0100.5e6f.efef vlan 1 interface

    gigabitethernet 2/46 gigabitethernet 2/48

    !--- Note: This command should be on one line.

    Switch1#show mac-address-table multicast vlan 1

     vlan   mac address     type   learn qos             ports
    -----+---------------+--------+-----+---+--------------------------------

       1  0100.5e6f.efef    static  Yes          -   Gi2/46,Gi2/48


    Switch2(config)#mac-address-table static 0100.5e6f.efef vlan 1 interface

    fastethernet 1/0/47

    !--- Note: This command should be on one line.

    Switch2#show mac-address-table multicast vlan 1
    Vlan    Mac Address       Type       Ports
    ----    -----------       ----       -----
       1    0100.5e6f.efef    USER        Fa1/0/47


Cluster is operational, the cost is 0$.

-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/7ab4a869/attachment.htm>

From jmarc1 at jemconsult.biz  Thu Sep 10 17:16:10 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 10 Sep 2009 13:16:10 -0400 (EDT)
Subject: [Linux-cluster] clustering questions
In-Reply-To: <577671592.671252602714464.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <1039609687.691252602970657.JavaMail.root@srv01.jemconsult.biz>

Hi Everyone,

It's been a while since I took the clustering class and some items have changed. Can someone tell me how to setup a quorum disk in the cluster settings in regards to the heuristics programs used to test?

Thanks,

james



From pradhanparas at gmail.com  Thu Sep 10 17:22:17 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 10 Sep 2009 12:22:17 -0500
Subject: [Linux-cluster] clustering questions
In-Reply-To: <1039609687.691252602970657.JavaMail.root@srv01.jemconsult.biz>
References: <577671592.671252602714464.JavaMail.root@srv01.jemconsult.biz>
	<1039609687.691252602970657.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <8b711df40909101022gff532a1pcf1e669257fe6786@mail.gmail.com>

This is a nice article

http://magazine.redhat.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/


Paras.

On Thu, Sep 10, 2009 at 12:16 PM, James Marcinek <jmarc1 at jemconsult.biz> wrote:
> Hi Everyone,
>
> It's been a while since I took the clustering class and some items have changed. Can someone tell me how to setup a quorum disk in the cluster settings in regards to the heuristics programs used to test?
>
> Thanks,
>
> james
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From jmarc1 at jemconsult.biz  Thu Sep 10 17:33:39 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 10 Sep 2009 13:33:39 -0400 (EDT)
Subject: [Linux-cluster] clustering questions
In-Reply-To: <8b711df40909101022gff532a1pcf1e669257fe6786@mail.gmail.com>
Message-ID: <983262568.721252604019294.JavaMail.root@srv01.jemconsult.biz>

thanks
----- Original Message -----
From: "Paras pradhan" <pradhanparas at gmail.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 10, 2009 1:22:17 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] clustering questions

This is a nice article

http://magazine.redhat.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/


Paras.

On Thu, Sep 10, 2009 at 12:16 PM, James Marcinek <jmarc1 at jemconsult.biz> wrote:
> Hi Everyone,
>
> It's been a while since I took the clustering class and some items have changed. Can someone tell me how to setup a quorum disk in the cluster settings in regards to the heuristics programs used to test?
>
> Thanks,
>
> james
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jmarc1 at jemconsult.biz  Thu Sep 10 17:37:17 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 10 Sep 2009 13:37:17 -0400 (EDT)
Subject: [Linux-cluster] EXT3 or GFS shared disk
Message-ID: <1541966917.751252604237788.JavaMail.root@srv01.jemconsult.biz>

Hello again,

Next question. 

Again since my cluster class (back in '04) GFS wasn't around so I'm not sure if I should use this or not in the cluster build...

If I have an active/passive cluster where only one node needs to have access to the file system at a given time should I just use an ext3 partition or should I use GFS on a logical volume?

I just tried to create an lvm shared logical volume with an ext3 parition (already did an lvmconf --enable-cluster) but it caused me some grief and I switched it to a partition after cleaning up...

Thanks,

James



From Luis.Cerezo at pgs.com  Thu Sep 10 19:14:00 2009
From: Luis.Cerezo at pgs.com (Luis Cerezo)
Date: Thu, 10 Sep 2009 14:14:00 -0500
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <1541966917.751252604237788.JavaMail.root@srv01.jemconsult.biz>
References: <1541966917.751252604237788.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <DD4C1537-9819-4832-A352-8ED67A6E1992@pgs.com>

what grief did it give you? also- what version of RHEL are you running?

5.1 has some known issues with clvmd

-luis

Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:

> Hello again,
>
> Next question.
>
> Again since my cluster class (back in '04) GFS wasn't around so I'm  
> not sure if I should use this or not in the cluster build...
>
> If I have an active/passive cluster where only one node needs to  
> have access to the file system at a given time should I just use an  
> ext3 partition or should I use GFS on a logical volume?
>
> I just tried to create an lvm shared logical volume with an ext3  
> parition (already did an lvmconf --enable-cluster) but it caused me  
> some grief and I switched it to a partition after cleaning up...
>
> Thanks,
>
> James
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.



From jmarc1 at jemconsult.biz  Thu Sep 10 19:38:33 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 10 Sep 2009 15:38:33 -0400 (EDT)
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <276309241.791252611421865.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <1416405280.811252611513666.JavaMail.root@srv01.jemconsult.biz>

I'm running 5.3 and it gave me a locking issue and indicated that it couldn't create the logical volume. However it showed up and I had some issues getting rid of it.

I couldn't get rid of the LVM because it couldn't locate the id. In the end I rebooted the node and then I could get rid of it...

I would prefer to use logical volumes if possible. The packages are there, one of the nodes did have an issue with the clvmd not starting...

I'm working on rebuilding the cluster.conf. It kept coming up as in the config but 'not a member'. When I went into system-config-cluster on one node it showed up as a member but not the other. Went to the other node and the same thing, it was a member but not the other.

Right now, I've totally scratched the original cluster.conf. I created a new one and copied it to the second node. started cman and rgmanager and the same thing they both are in the configs but only one member in the cluster management tab...

What's going on?

----- Original Message -----
From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] EXT3 or GFS shared disk

what grief did it give you? also- what version of RHEL are you running?

5.1 has some known issues with clvmd

-luis

Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:

> Hello again,
>
> Next question.
>
> Again since my cluster class (back in '04) GFS wasn't around so I'm  
> not sure if I should use this or not in the cluster build...
>
> If I have an active/passive cluster where only one node needs to  
> have access to the file system at a given time should I just use an  
> ext3 partition or should I use GFS on a logical volume?
>
> I just tried to create an lvm shared logical volume with an ext3  
> parition (already did an lvmconf --enable-cluster) but it caused me  
> some grief and I switched it to a partition after cleaning up...
>
> Thanks,
>
> James
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jmarc1 at jemconsult.biz  Thu Sep 10 19:53:39 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 10 Sep 2009 15:53:39 -0400 (EDT)
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <1416405280.811252611513666.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <995114555.841252612419668.JavaMail.root@srv01.jemconsult.biz>

I forgot to turn on clvmd but now that I do I'm getting some 'Call Trace' issues...


----- Original Message -----
From: "James Marcinek" <jmarc1 at jemconsult.biz>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 10, 2009 3:38:33 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] EXT3 or GFS shared disk

I'm running 5.3 and it gave me a locking issue and indicated that it couldn't create the logical volume. However it showed up and I had some issues getting rid of it.

I couldn't get rid of the LVM because it couldn't locate the id. In the end I rebooted the node and then I could get rid of it...

I would prefer to use logical volumes if possible. The packages are there, one of the nodes did have an issue with the clvmd not starting...

I'm working on rebuilding the cluster.conf. It kept coming up as in the config but 'not a member'. When I went into system-config-cluster on one node it showed up as a member but not the other. Went to the other node and the same thing, it was a member but not the other.

Right now, I've totally scratched the original cluster.conf. I created a new one and copied it to the second node. started cman and rgmanager and the same thing they both are in the configs but only one member in the cluster management tab...

What's going on?

----- Original Message -----
From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] EXT3 or GFS shared disk

what grief did it give you? also- what version of RHEL are you running?

5.1 has some known issues with clvmd

-luis

Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:

> Hello again,
>
> Next question.
>
> Again since my cluster class (back in '04) GFS wasn't around so I'm  
> not sure if I should use this or not in the cluster build...
>
> If I have an active/passive cluster where only one node needs to  
> have access to the file system at a given time should I just use an  
> ext3 partition or should I use GFS on a logical volume?
>
> I just tried to create an lvm shared logical volume with an ext3  
> parition (already did an lvmconf --enable-cluster) but it caused me  
> some grief and I switched it to a partition after cleaning up...
>
> Thanks,
>
> James
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From Luis.Cerezo at pgs.com  Thu Sep 10 19:58:35 2009
From: Luis.Cerezo at pgs.com (Luis Cerezo)
Date: Thu, 10 Sep 2009 14:58:35 -0500
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <1416405280.811252611513666.JavaMail.root@srv01.jemconsult.biz>
References: <1416405280.811252611513666.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <D671334E-D9B0-46A7-906A-E457BD2656AA@pgs.com>

you really got to the cluster in quorum before the lvm to work nicely.

what is the output of clustat?

do you have clvmd on both nodes up and running?

did you run pvscan/vgscan/lvscan after initializing the volume?

what did vgdisplay say? what it set to not avail etc?

-luis

Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 10, 2009, at 2:38 PM, James Marcinek wrote:

> I'm running 5.3 and it gave me a locking issue and indicated that it  
> couldn't create the logical volume. However it showed up and I had  
> some issues getting rid of it.
>
> I couldn't get rid of the LVM because it couldn't locate the id. In  
> the end I rebooted the node and then I could get rid of it...
>
> I would prefer to use logical volumes if possible. The packages are  
> there, one of the nodes did have an issue with the clvmd not  
> starting...
>
> I'm working on rebuilding the cluster.conf. It kept coming up as in  
> the config but 'not a member'. When I went into system-config- 
> cluster on one node it showed up as a member but not the other. Went  
> to the other node and the same thing, it was a member but not the  
> other.
>
> Right now, I've totally scratched the original cluster.conf. I  
> created a new one and copied it to the second node. started cman and  
> rgmanager and the same thing they both are in the configs but only  
> one member in the cluster management tab...
>
> What's going on?
>
> ----- Original Message -----
> From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada  
> Eastern
> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>
> what grief did it give you? also- what version of RHEL are you  
> running?
>
> 5.1 has some known issues with clvmd
>
> -luis
>
> Luis E. Cerezo
> Global IT
> GV: +1 412 223 7396
>
> On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:
>
>> Hello again,
>>
>> Next question.
>>
>> Again since my cluster class (back in '04) GFS wasn't around so I'm
>> not sure if I should use this or not in the cluster build...
>>
>> If I have an active/passive cluster where only one node needs to
>> have access to the file system at a given time should I just use an
>> ext3 partition or should I use GFS on a logical volume?
>>
>> I just tried to create an lvm shared logical volume with an ext3
>> parition (already did an lvmconf --enable-cluster) but it caused me
>> some grief and I switched it to a partition after cleaning up...
>>
>> Thanks,
>>
>> James
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> This e-mail, including any attachments and response string, may  
> contain proprietary information which is confidential and may be  
> legally privileged. It is for the intended recipient only. If you  
> are not the intended recipient or transmission error has misdirected  
> this e-mail, please notify the author by return e-mail and delete  
> this message and any attachment immediately. If you are not the  
> intended recipient you must not use, disclose, distribute, forward,  
> copy, print or rely on this e-mail in any way except as permitted by  
> the author.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.



From mgrac at redhat.com  Thu Sep 10 20:13:13 2009
From: mgrac at redhat.com (Marek Grac)
Date: Thu, 10 Sep 2009 22:13:13 +0200
Subject: [Linux-cluster] where exactly cluster services are stoppped during
	shutdown?
In-Reply-To: <561c252c0909100429q7671ad0cj3880792a603a24a5@mail.gmail.com>
References: <561c252c0909100429q7671ad0cj3880792a603a24a5@mail.gmail.com>
Message-ID: <4AA95DD9.7050608@redhat.com>

Gianluca Cecchi wrote:
> SO now the question is: to understand correctly how to manage eventual 
> interactions with other init scripts, where and how exactly the 
> service srvname will be stopped when I run
>
> shutdown -h now
> or
> shutdown -r now
> ?
> Which one of the init script related is responsible to do a "srvname 
> stop" and when?
>
> I presume rgmanager but I would like confirmation.
>
Yes, it is rgmanager
> Is it correct to leave "as is" the init script in general if it is a 
> standard provided one or do I have to change it to be correctly 
> managed as a cluster script?

It depends. If you will have just one instance of such resource on 
cluster then it should work without problem. Main problems are multiple 
instances as they need some modifications.

m,



From jmarc1 at jemconsult.biz  Thu Sep 10 20:47:27 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 10 Sep 2009 16:47:27 -0400 (EDT)
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <D671334E-D9B0-46A7-906A-E457BD2656AA@pgs.com>
Message-ID: <1717607753.871252615647136.JavaMail.root@srv01.jemconsult.biz>

it turns out after my initial issues I turned off clvmd on both nodes. One of them comes up nice but the other hangs... I'm going to boot in runlevel 1 and check my lvm stuff this might be the root cause (hoping) of why it's not they're not becoming members (one sees the phantom lvm and the other does not)


----- Original Message -----
From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 10, 2009 3:58:35 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] EXT3 or GFS shared disk

you really got to the cluster in quorum before the lvm to work nicely.

what is the output of clustat?

do you have clvmd on both nodes up and running?

did you run pvscan/vgscan/lvscan after initializing the volume?

what did vgdisplay say? what it set to not avail etc?

-luis

Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 10, 2009, at 2:38 PM, James Marcinek wrote:

> I'm running 5.3 and it gave me a locking issue and indicated that it  
> couldn't create the logical volume. However it showed up and I had  
> some issues getting rid of it.
>
> I couldn't get rid of the LVM because it couldn't locate the id. In  
> the end I rebooted the node and then I could get rid of it...
>
> I would prefer to use logical volumes if possible. The packages are  
> there, one of the nodes did have an issue with the clvmd not  
> starting...
>
> I'm working on rebuilding the cluster.conf. It kept coming up as in  
> the config but 'not a member'. When I went into system-config- 
> cluster on one node it showed up as a member but not the other. Went  
> to the other node and the same thing, it was a member but not the  
> other.
>
> Right now, I've totally scratched the original cluster.conf. I  
> created a new one and copied it to the second node. started cman and  
> rgmanager and the same thing they both are in the configs but only  
> one member in the cluster management tab...
>
> What's going on?
>
> ----- Original Message -----
> From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada  
> Eastern
> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>
> what grief did it give you? also- what version of RHEL are you  
> running?
>
> 5.1 has some known issues with clvmd
>
> -luis
>
> Luis E. Cerezo
> Global IT
> GV: +1 412 223 7396
>
> On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:
>
>> Hello again,
>>
>> Next question.
>>
>> Again since my cluster class (back in '04) GFS wasn't around so I'm
>> not sure if I should use this or not in the cluster build...
>>
>> If I have an active/passive cluster where only one node needs to
>> have access to the file system at a given time should I just use an
>> ext3 partition or should I use GFS on a logical volume?
>>
>> I just tried to create an lvm shared logical volume with an ext3
>> parition (already did an lvmconf --enable-cluster) but it caused me
>> some grief and I switched it to a partition after cleaning up...
>>
>> Thanks,
>>
>> James
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> This e-mail, including any attachments and response string, may  
> contain proprietary information which is confidential and may be  
> legally privileged. It is for the intended recipient only. If you  
> are not the intended recipient or transmission error has misdirected  
> this e-mail, please notify the author by return e-mail and delete  
> this message and any attachment immediately. If you are not the  
> intended recipient you must not use, disclose, distribute, forward,  
> copy, print or rely on this e-mail in any way except as permitted by  
> the author.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From Luis.Cerezo at pgs.com  Thu Sep 10 20:59:16 2009
From: Luis.Cerezo at pgs.com (Luis Cerezo)
Date: Thu, 10 Sep 2009 15:59:16 -0500
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <1717607753.871252615647136.JavaMail.root@srv01.jemconsult.biz>
References: <1717607753.871252615647136.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <052157FC-B793-41C8-B9BC-F96F7C9A0A7F@pgs.com>

This is shared storage correct?
have you tried the pvscan/vgscan/lvscan dance?

did you create the vg with the -c y ?

-luis


Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 10, 2009, at 3:47 PM, James Marcinek wrote:

> it turns out after my initial issues I turned off clvmd on both  
> nodes. One of them comes up nice but the other hangs... I'm going to  
> boot in runlevel 1 and check my lvm stuff this might be the root  
> cause (hoping) of why it's not they're not becoming members (one  
> sees the phantom lvm and the other does not)
>
>
> ----- Original Message -----
> From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Thursday, September 10, 2009 3:58:35 PM GMT -05:00 US/Canada  
> Eastern
> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>
> you really got to the cluster in quorum before the lvm to work nicely.
>
> what is the output of clustat?
>
> do you have clvmd on both nodes up and running?
>
> did you run pvscan/vgscan/lvscan after initializing the volume?
>
> what did vgdisplay say? what it set to not avail etc?
>
> -luis
>
> Luis E. Cerezo
> Global IT
> GV: +1 412 223 7396
>
> On Sep 10, 2009, at 2:38 PM, James Marcinek wrote:
>
>> I'm running 5.3 and it gave me a locking issue and indicated that it
>> couldn't create the logical volume. However it showed up and I had
>> some issues getting rid of it.
>>
>> I couldn't get rid of the LVM because it couldn't locate the id. In
>> the end I rebooted the node and then I could get rid of it...
>>
>> I would prefer to use logical volumes if possible. The packages are
>> there, one of the nodes did have an issue with the clvmd not
>> starting...
>>
>> I'm working on rebuilding the cluster.conf. It kept coming up as in
>> the config but 'not a member'. When I went into system-config-
>> cluster on one node it showed up as a member but not the other. Went
>> to the other node and the same thing, it was a member but not the
>> other.
>>
>> Right now, I've totally scratched the original cluster.conf. I
>> created a new one and copied it to the second node. started cman and
>> rgmanager and the same thing they both are in the configs but only
>> one member in the cluster management tab...
>>
>> What's going on?
>>
>> ----- Original Message -----
>> From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
>> To: "linux clustering" <linux-cluster at redhat.com>
>> Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada
>> Eastern
>> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>>
>> what grief did it give you? also- what version of RHEL are you
>> running?
>>
>> 5.1 has some known issues with clvmd
>>
>> -luis
>>
>> Luis E. Cerezo
>> Global IT
>> GV: +1 412 223 7396
>>
>> On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:
>>
>>> Hello again,
>>>
>>> Next question.
>>>
>>> Again since my cluster class (back in '04) GFS wasn't around so I'm
>>> not sure if I should use this or not in the cluster build...
>>>
>>> If I have an active/passive cluster where only one node needs to
>>> have access to the file system at a given time should I just use an
>>> ext3 partition or should I use GFS on a logical volume?
>>>
>>> I just tried to create an lvm shared logical volume with an ext3
>>> parition (already did an lvmconf --enable-cluster) but it caused me
>>> some grief and I switched it to a partition after cleaning up...
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> This e-mail, including any attachments and response string, may
>> contain proprietary information which is confidential and may be
>> legally privileged. It is for the intended recipient only. If you
>> are not the intended recipient or transmission error has misdirected
>> this e-mail, please notify the author by return e-mail and delete
>> this message and any attachment immediately. If you are not the
>> intended recipient you must not use, disclose, distribute, forward,
>> copy, print or rely on this e-mail in any way except as permitted by
>> the author.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> This e-mail, including any attachments and response string, may  
> contain proprietary information which is confidential and may be  
> legally privileged. It is for the intended recipient only. If you  
> are not the intended recipient or transmission error has misdirected  
> this e-mail, please notify the author by return e-mail and delete  
> this message and any attachment immediately. If you are not the  
> intended recipient you must not use, disclose, distribute, forward,  
> copy, print or rely on this e-mail in any way except as permitted by  
> the author.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.



From jmarc1 at jemconsult.biz  Thu Sep 10 21:12:37 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 10 Sep 2009 17:12:37 -0400 (EDT)
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <2010411174.901252616719034.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <52348438.921252617157281.JavaMail.root@srv01.jemconsult.biz>

no i didn't. that might be the root cause.

I was able to get rid of it but now I get these errors on the second cluster node

connect() failed on local socket: Connection refused
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible

when I do a cman_tool status, I see that there is a Flag indicating: 2node Dirty.

Is there a way to clean this up?

I'm trying to get this built so there's no data on anything.

Thanks,

James

----- Original Message -----
From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 10, 2009 4:59:16 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] EXT3 or GFS shared disk

This is shared storage correct?
have you tried the pvscan/vgscan/lvscan dance?

did you create the vg with the -c y ?

-luis


Luis E. Cerezo
Global IT
GV: +1 412 223 7396

On Sep 10, 2009, at 3:47 PM, James Marcinek wrote:

> it turns out after my initial issues I turned off clvmd on both  
> nodes. One of them comes up nice but the other hangs... I'm going to  
> boot in runlevel 1 and check my lvm stuff this might be the root  
> cause (hoping) of why it's not they're not becoming members (one  
> sees the phantom lvm and the other does not)
>
>
> ----- Original Message -----
> From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Thursday, September 10, 2009 3:58:35 PM GMT -05:00 US/Canada  
> Eastern
> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>
> you really got to the cluster in quorum before the lvm to work nicely.
>
> what is the output of clustat?
>
> do you have clvmd on both nodes up and running?
>
> did you run pvscan/vgscan/lvscan after initializing the volume?
>
> what did vgdisplay say? what it set to not avail etc?
>
> -luis
>
> Luis E. Cerezo
> Global IT
> GV: +1 412 223 7396
>
> On Sep 10, 2009, at 2:38 PM, James Marcinek wrote:
>
>> I'm running 5.3 and it gave me a locking issue and indicated that it
>> couldn't create the logical volume. However it showed up and I had
>> some issues getting rid of it.
>>
>> I couldn't get rid of the LVM because it couldn't locate the id. In
>> the end I rebooted the node and then I could get rid of it...
>>
>> I would prefer to use logical volumes if possible. The packages are
>> there, one of the nodes did have an issue with the clvmd not
>> starting...
>>
>> I'm working on rebuilding the cluster.conf. It kept coming up as in
>> the config but 'not a member'. When I went into system-config-
>> cluster on one node it showed up as a member but not the other. Went
>> to the other node and the same thing, it was a member but not the
>> other.
>>
>> Right now, I've totally scratched the original cluster.conf. I
>> created a new one and copied it to the second node. started cman and
>> rgmanager and the same thing they both are in the configs but only
>> one member in the cluster management tab...
>>
>> What's going on?
>>
>> ----- Original Message -----
>> From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
>> To: "linux clustering" <linux-cluster at redhat.com>
>> Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada
>> Eastern
>> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>>
>> what grief did it give you? also- what version of RHEL are you
>> running?
>>
>> 5.1 has some known issues with clvmd
>>
>> -luis
>>
>> Luis E. Cerezo
>> Global IT
>> GV: +1 412 223 7396
>>
>> On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:
>>
>>> Hello again,
>>>
>>> Next question.
>>>
>>> Again since my cluster class (back in '04) GFS wasn't around so I'm
>>> not sure if I should use this or not in the cluster build...
>>>
>>> If I have an active/passive cluster where only one node needs to
>>> have access to the file system at a given time should I just use an
>>> ext3 partition or should I use GFS on a logical volume?
>>>
>>> I just tried to create an lvm shared logical volume with an ext3
>>> parition (already did an lvmconf --enable-cluster) but it caused me
>>> some grief and I switched it to a partition after cleaning up...
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> This e-mail, including any attachments and response string, may
>> contain proprietary information which is confidential and may be
>> legally privileged. It is for the intended recipient only. If you
>> are not the intended recipient or transmission error has misdirected
>> this e-mail, please notify the author by return e-mail and delete
>> this message and any attachment immediately. If you are not the
>> intended recipient you must not use, disclose, distribute, forward,
>> copy, print or rely on this e-mail in any way except as permitted by
>> the author.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> This e-mail, including any attachments and response string, may  
> contain proprietary information which is confidential and may be  
> legally privileged. It is for the intended recipient only. If you  
> are not the intended recipient or transmission error has misdirected  
> this e-mail, please notify the author by return e-mail and delete  
> this message and any attachment immediately. If you are not the  
> intended recipient you must not use, disclose, distribute, forward,  
> copy, print or rely on this e-mail in any way except as permitted by  
> the author.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From brem.belguebli at gmail.com  Thu Sep 10 21:23:06 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Thu, 10 Sep 2009 23:23:06 +0200
Subject: [Linux-cluster] Re: Fencing question in geo cluster (dual sites
	clustering)
In-Reply-To: <29ae894c0908210227r85df80fm173af6452d22a5b2@mail.gmail.com>
References: <29ae894c0908210227r85df80fm173af6452d22a5b2@mail.gmail.com>
Message-ID: <29ae894c0909101423t445df409w4a3a97075d6ce172@mail.gmail.com>

Hi,
No comments on this RHCS gurus ? Am I trying to setup (multisite cluster)
something that 'll never be supported ?

Or is the qdiskd reboot action considered as sufficient?  (Reboot action
should be a dirty power reset to prevent data syncing)

If so, all IO's on the wrong nodes (at the isolated site) should be frozen
untill quorum is eventually regained. If not it'll end up with a (dirty)
reboot.

Brem


2009/8/21 brem belguebli <brem.belguebli at gmail.com>

> Hi,
>
> I'm trying to find out what best fencing solution could fit a dual sites
> cluster.
>
> Cluster is equally sized on each site (2 nodes/site), each site hosting a
> SAN array so that each node from any site can see the 2 arrays.
>
> Quorum  disk (iscsi LUN) is hosted on a 3rd site.
>
> SAN and LAN using the same telco infrastructure (2 redundant DWDM loops).
>
> In case something happens at Telco level (both DWDM loops are broken) that
> makes 1 of the 2 sites completely isolated from the rest of the world,
> the nodes at the good site (the one still operationnal) won't be able to
> fence any node from the wrong site (the one that is isolated) as there is no
> way for them to reach their ILO's or do any SAN fencing as the switches at
> the wrong site are no more reachable.
>
> As qdiskd is not reachable from the wrong nodes, they end up being rebooted
> by  qdisk, but there is a short time (a few seconds) during which the wrong
> nodes are still seing their local SAN array storage and may potentially have
> written data on it.
>
> Any ideas or comments on how to ensure data integrity in such setup ?
>
> Regards
>
> Brem
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/bf7edd68/attachment.htm>

From brem.belguebli at gmail.com  Thu Sep 10 21:31:49 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Thu, 10 Sep 2009 23:31:49 +0200
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <52348438.921252617157281.JavaMail.root@srv01.jemconsult.biz>
References: <2010411174.901252616719034.JavaMail.root@srv01.jemconsult.biz>
	<52348438.921252617157281.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <29ae894c0909101431i344c7dv10f251a225b284e2@mail.gmail.com>

The dirty flag is not pointing to any error, it's a normal status (that's
going to be changed to something else in future releases as many people got
worried about it).
The message you get means that your second node cannot contact clvmd and so
cannot access clustered VG's.

issue a ps -ef | grep clvmd to see if it is running ( you may find 2 clvmd
running , means that things went bad !!)

The solution is to restart your second node (reboot, start cman, etc...) and
check if it goes the right way.

Brem

2009/9/10 James Marcinek <jmarc1 at jemconsult.biz>

> no i didn't. that might be the root cause.
>
> I was able to get rid of it but now I get these errors on the second
> cluster node
>
> connect() failed on local socket: Connection refused
> WARNING: Falling back to local file-based locking.
> Volume Groups with the clustered attribute will be inaccessible
>
> when I do a cman_tool status, I see that there is a Flag indicating: 2node
> Dirty.
>
> Is there a way to clean this up?
>
> I'm trying to get this built so there's no data on anything.
>
> Thanks,
>
> James
>
> ----- Original Message -----
> From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Thursday, September 10, 2009 4:59:16 PM GMT -05:00 US/Canada Eastern
> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>
> This is shared storage correct?
> have you tried the pvscan/vgscan/lvscan dance?
>
> did you create the vg with the -c y ?
>
> -luis
>
>
> Luis E. Cerezo
> Global IT
> GV: +1 412 223 7396
>
> On Sep 10, 2009, at 3:47 PM, James Marcinek wrote:
>
> > it turns out after my initial issues I turned off clvmd on both
> > nodes. One of them comes up nice but the other hangs... I'm going to
> > boot in runlevel 1 and check my lvm stuff this might be the root
> > cause (hoping) of why it's not they're not becoming members (one
> > sees the phantom lvm and the other does not)
> >
> >
> > ----- Original Message -----
> > From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
> > To: "linux clustering" <linux-cluster at redhat.com>
> > Sent: Thursday, September 10, 2009 3:58:35 PM GMT -05:00 US/Canada
> > Eastern
> > Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
> >
> > you really got to the cluster in quorum before the lvm to work nicely.
> >
> > what is the output of clustat?
> >
> > do you have clvmd on both nodes up and running?
> >
> > did you run pvscan/vgscan/lvscan after initializing the volume?
> >
> > what did vgdisplay say? what it set to not avail etc?
> >
> > -luis
> >
> > Luis E. Cerezo
> > Global IT
> > GV: +1 412 223 7396
> >
> > On Sep 10, 2009, at 2:38 PM, James Marcinek wrote:
> >
> >> I'm running 5.3 and it gave me a locking issue and indicated that it
> >> couldn't create the logical volume. However it showed up and I had
> >> some issues getting rid of it.
> >>
> >> I couldn't get rid of the LVM because it couldn't locate the id. In
> >> the end I rebooted the node and then I could get rid of it...
> >>
> >> I would prefer to use logical volumes if possible. The packages are
> >> there, one of the nodes did have an issue with the clvmd not
> >> starting...
> >>
> >> I'm working on rebuilding the cluster.conf. It kept coming up as in
> >> the config but 'not a member'. When I went into system-config-
> >> cluster on one node it showed up as a member but not the other. Went
> >> to the other node and the same thing, it was a member but not the
> >> other.
> >>
> >> Right now, I've totally scratched the original cluster.conf. I
> >> created a new one and copied it to the second node. started cman and
> >> rgmanager and the same thing they both are in the configs but only
> >> one member in the cluster management tab...
> >>
> >> What's going on?
> >>
> >> ----- Original Message -----
> >> From: "Luis Cerezo" <Luis.Cerezo at pgs.com>
> >> To: "linux clustering" <linux-cluster at redhat.com>
> >> Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada
> >> Eastern
> >> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
> >>
> >> what grief did it give you? also- what version of RHEL are you
> >> running?
> >>
> >> 5.1 has some known issues with clvmd
> >>
> >> -luis
> >>
> >> Luis E. Cerezo
> >> Global IT
> >> GV: +1 412 223 7396
> >>
> >> On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:
> >>
> >>> Hello again,
> >>>
> >>> Next question.
> >>>
> >>> Again since my cluster class (back in '04) GFS wasn't around so I'm
> >>> not sure if I should use this or not in the cluster build...
> >>>
> >>> If I have an active/passive cluster where only one node needs to
> >>> have access to the file system at a given time should I just use an
> >>> ext3 partition or should I use GFS on a logical volume?
> >>>
> >>> I just tried to create an lvm shared logical volume with an ext3
> >>> parition (already did an lvmconf --enable-cluster) but it caused me
> >>> some grief and I switched it to a partition after cleaning up...
> >>>
> >>> Thanks,
> >>>
> >>> James
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >> This e-mail, including any attachments and response string, may
> >> contain proprietary information which is confidential and may be
> >> legally privileged. It is for the intended recipient only. If you
> >> are not the intended recipient or transmission error has misdirected
> >> this e-mail, please notify the author by return e-mail and delete
> >> this message and any attachment immediately. If you are not the
> >> intended recipient you must not use, disclose, distribute, forward,
> >> copy, print or rely on this e-mail in any way except as permitted by
> >> the author.
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > This e-mail, including any attachments and response string, may
> > contain proprietary information which is confidential and may be
> > legally privileged. It is for the intended recipient only. If you
> > are not the intended recipient or transmission error has misdirected
> > this e-mail, please notify the author by return e-mail and delete
> > this message and any attachment immediately. If you are not the
> > intended recipient you must not use, disclose, distribute, forward,
> > copy, print or rely on this e-mail in any way except as permitted by
> > the author.
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> This e-mail, including any attachments and response string, may contain
> proprietary information which is confidential and may be legally privileged.
> It is for the intended recipient only. If you are not the intended recipient
> or transmission error has misdirected this e-mail, please notify the author
> by return e-mail and delete this message and any attachment immediately. If
> you are not the intended recipient you must not use, disclose, distribute,
> forward, copy, print or rely on this e-mail in any way except as permitted
> by the author.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/fa8b057b/attachment.htm>

From arwin.tugade at csun.edu  Thu Sep 10 22:04:04 2009
From: arwin.tugade at csun.edu (Arwin L Tugade)
Date: Thu, 10 Sep 2009 15:04:04 -0700
Subject: [Linux-cluster] 3rd node won't rejoin cluster
Message-ID: <6708F96BBF31F846BFA56EC0AE37D62284C295E6E3@CSUN-EX-V01.csun.edu>

I've just run into a odd problem on my production cluster.  One of the nodes got fenced (still digging through logs to find out why) and on it's way back up, it appears to join the cluster find but the node that fenced it starts spewing out tons of these in /var/log/messages:

Sep 10 14:25:34 redwing gfs_controld[6119]: cpg_mcast_joined retry 176200 unknown
Sep 10 14:25:35 redwing gfs_controld[6119]: cpg_mcast_joined retry 176300 unknown
Sep 10 14:25:35 redwing gfs_controld[6119]: cpg_mcast_joined retry 176400 unknown
Sep 10 14:25:35 redwing gfs_controld[6119]: cpg_mcast_joined retry 176500 unknown
Sep 10 14:25:35 redwing gfs_controld[6119]: cpg_mcast_joined retry 176600 unknown
Sep 10 14:25:35 redwing gfs_controld[6119]: cpg_mcast_joined retry 176700 unknown
....
...

The node that got fenced just hangs at the "Starting Fencing..." part of cman, while redwing (the node that fenced it) starts to climb in load slowly but surely.  I ended up bringing down the fenced node and I'm running fine off the 2 remaining nodes.  Has anyone ran into this problem.

I'm running RHEL5.3 with these packages:

[a_arwin at redwing ~]$ rpm -qa | egrep 'cman|rgman|gfs|lvm'
lvm2-2.02.40-6.el5
kmod-gfs-0.1.31-3.el5
cman-2.0.98-1.el5_3.1
gfs-utils-0.1.18-1.el5
rgmanager-2.0.46-1.el5_3.3
gfs2-utils-0.1.53-1.el5_3.2
lvm2-cluster-2.02.40-7.el5
[a_arwin at redwing ~]$ uname -a
Linux redwing.csun.edu 2.6.18-128.1.6.el5 #1 SMP Tue Mar 24 12:05:57 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Thanks ahead of time,
Arwin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/af29fb03/attachment.htm>

From jmarc1 at jemconsult.biz  Thu Sep 10 22:10:20 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 10 Sep 2009 18:10:20 -0400 (EDT)
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <29ae894c0909101431i344c7dv10f251a225b284e2@mail.gmail.com>
Message-ID: <238713828.951252620620042.JavaMail.root@srv01.jemconsult.biz>

When I try to issue lvdisplay commands on the 2nd node (problem child) it was hanging... There was only one clvmd service running when I ps'd it... May just rebuild the thing unless someone else has another option. Rebooting hasn't seemed to fix it. What log files would you recommend I examine.

Thanks,

james


----- Original Message -----
From: "brem belguebli" <brem.belguebli at gmail.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 10, 2009 5:31:49 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] EXT3 or GFS shared disk


The dirty flag is not pointing to any error, it's a normal status (that's going to be changed to something else in future releases as many people got worried about it). 


The message you get means that your second node cannot contact clvmd and so cannot access clustered VG's. 


issue a ps -ef | grep clvmd to see if it is running ( you may find 2 clvmd running , means that things went bad !!) 


The solution is to restart your second node (reboot, start cman, etc...) and check if it goes the right way. 


Brem 


2009/9/10 James Marcinek < jmarc1 at jemconsult.biz > 


no i didn't. that might be the root cause. 

I was able to get rid of it but now I get these errors on the second cluster node 

connect() failed on local socket: Connection refused 
WARNING: Falling back to local file-based locking. 
Volume Groups with the clustered attribute will be inaccessible 

when I do a cman_tool status, I see that there is a Flag indicating: 2node Dirty. 

Is there a way to clean this up? 

I'm trying to get this built so there's no data on anything. 

Thanks, 

James 


----- Original Message ----- 
From: "Luis Cerezo" < Luis.Cerezo at pgs.com > 
To: "linux clustering" < linux-cluster at redhat.com > 



Sent: Thursday, September 10, 2009 4:59:16 PM GMT -05:00 US/Canada Eastern 
Subject: Re: [Linux-cluster] EXT3 or GFS shared disk 

This is shared storage correct? 
have you tried the pvscan/vgscan/lvscan dance? 

did you create the vg with the -c y ? 

-luis 


Luis E. Cerezo 
Global IT 
GV: +1 412 223 7396 

On Sep 10, 2009, at 3:47 PM, James Marcinek wrote: 

> it turns out after my initial issues I turned off clvmd on both 
> nodes. One of them comes up nice but the other hangs... I'm going to 
> boot in runlevel 1 and check my lvm stuff this might be the root 
> cause (hoping) of why it's not they're not becoming members (one 
> sees the phantom lvm and the other does not) 
> 
> 
> ----- Original Message ----- 
> From: "Luis Cerezo" < Luis.Cerezo at pgs.com > 
> To: "linux clustering" < linux-cluster at redhat.com > 
> Sent: Thursday, September 10, 2009 3:58:35 PM GMT -05:00 US/Canada 
> Eastern 
> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk 
> 
> you really got to the cluster in quorum before the lvm to work nicely. 
> 
> what is the output of clustat? 
> 
> do you have clvmd on both nodes up and running? 
> 
> did you run pvscan/vgscan/lvscan after initializing the volume? 
> 
> what did vgdisplay say? what it set to not avail etc? 
> 
> -luis 
> 
> Luis E. Cerezo 
> Global IT 
> GV: +1 412 223 7396 
> 
> On Sep 10, 2009, at 2:38 PM, James Marcinek wrote: 
> 
>> I'm running 5.3 and it gave me a locking issue and indicated that it 
>> couldn't create the logical volume. However it showed up and I had 
>> some issues getting rid of it. 
>> 
>> I couldn't get rid of the LVM because it couldn't locate the id. In 
>> the end I rebooted the node and then I could get rid of it... 
>> 
>> I would prefer to use logical volumes if possible. The packages are 
>> there, one of the nodes did have an issue with the clvmd not 
>> starting... 
>> 
>> I'm working on rebuilding the cluster.conf. It kept coming up as in 
>> the config but 'not a member'. When I went into system-config- 
>> cluster on one node it showed up as a member but not the other. Went 
>> to the other node and the same thing, it was a member but not the 
>> other. 
>> 
>> Right now, I've totally scratched the original cluster.conf. I 
>> created a new one and copied it to the second node. started cman and 
>> rgmanager and the same thing they both are in the configs but only 
>> one member in the cluster management tab... 
>> 
>> What's going on? 
>> 
>> ----- Original Message ----- 
>> From: "Luis Cerezo" < Luis.Cerezo at pgs.com > 
>> To: "linux clustering" < linux-cluster at redhat.com > 
>> Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada 
>> Eastern 
>> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk 
>> 
>> what grief did it give you? also- what version of RHEL are you 
>> running? 
>> 
>> 5.1 has some known issues with clvmd 
>> 
>> -luis 
>> 
>> Luis E. Cerezo 
>> Global IT 
>> GV: +1 412 223 7396 
>> 
>> On Sep 10, 2009, at 12:37 PM, James Marcinek wrote: 
>> 
>>> Hello again, 
>>> 
>>> Next question. 
>>> 
>>> Again since my cluster class (back in '04) GFS wasn't around so I'm 
>>> not sure if I should use this or not in the cluster build... 
>>> 
>>> If I have an active/passive cluster where only one node needs to 
>>> have access to the file system at a given time should I just use an 
>>> ext3 partition or should I use GFS on a logical volume? 
>>> 
>>> I just tried to create an lvm shared logical volume with an ext3 
>>> parition (already did an lvmconf --enable-cluster) but it caused me 
>>> some grief and I switched it to a partition after cleaning up... 
>>> 
>>> Thanks, 
>>> 
>>> James 
>>> 
>>> -- 
>>> Linux-cluster mailing list 
>>> Linux-cluster at redhat.com 
>>> https://www.redhat.com/mailman/listinfo/linux-cluster 
>> 
>> 
>> This e-mail, including any attachments and response string, may 
>> contain proprietary information which is confidential and may be 
>> legally privileged. It is for the intended recipient only. If you 
>> are not the intended recipient or transmission error has misdirected 
>> this e-mail, please notify the author by return e-mail and delete 
>> this message and any attachment immediately. If you are not the 
>> intended recipient you must not use, disclose, distribute, forward, 
>> copy, print or rely on this e-mail in any way except as permitted by 
>> the author. 
>> 
>> -- 
>> Linux-cluster mailing list 
>> Linux-cluster at redhat.com 
>> https://www.redhat.com/mailman/listinfo/linux-cluster 
>> 
>> -- 
>> Linux-cluster mailing list 
>> Linux-cluster at redhat.com 
>> https://www.redhat.com/mailman/listinfo/linux-cluster 
> 
> 
> This e-mail, including any attachments and response string, may 
> contain proprietary information which is confidential and may be 
> legally privileged. It is for the intended recipient only. If you 
> are not the intended recipient or transmission error has misdirected 
> this e-mail, please notify the author by return e-mail and delete 
> this message and any attachment immediately. If you are not the 
> intended recipient you must not use, disclose, distribute, forward, 
> copy, print or rely on this e-mail in any way except as permitted by 
> the author. 
> 
> -- 
> Linux-cluster mailing list 
> Linux-cluster at redhat.com 
> https://www.redhat.com/mailman/listinfo/linux-cluster 
> 
> -- 
> Linux-cluster mailing list 
> Linux-cluster at redhat.com 
> https://www.redhat.com/mailman/listinfo/linux-cluster 


This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author. 

-- 
Linux-cluster mailing list 
Linux-cluster at redhat.com 
https://www.redhat.com/mailman/listinfo/linux-cluster 

-- 
Linux-cluster mailing list 
Linux-cluster at redhat.com 
https://www.redhat.com/mailman/listinfo/linux-cluster 


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From brem.belguebli at gmail.com  Thu Sep 10 22:30:28 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Fri, 11 Sep 2009 00:30:28 +0200
Subject: [Linux-cluster] EXT3 or GFS shared disk
In-Reply-To: <238713828.951252620620042.JavaMail.root@srv01.jemconsult.biz>
References: <29ae894c0909101431i344c7dv10f251a225b284e2@mail.gmail.com>
	<238713828.951252620620042.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <29ae894c0909101530s75e37734pc124b798cb259fc3@mail.gmail.com>

Issue a cman_tool status on both nodes and also group_tool and post the
outputs


2009/9/11 James Marcinek <jmarc1 at jemconsult.biz>

> When I try to issue lvdisplay commands on the 2nd node (problem child) it
> was hanging... There was only one clvmd service running when I ps'd it...
> May just rebuild the thing unless someone else has another option. Rebooting
> hasn't seemed to fix it. What log files would you recommend I examine.
>
> Thanks,
>
> james
>
>
> ----- Original Message -----
> From: "brem belguebli" <brem.belguebli at gmail.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Thursday, September 10, 2009 5:31:49 PM GMT -05:00 US/Canada Eastern
> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>
>
> The dirty flag is not pointing to any error, it's a normal status (that's
> going to be changed to something else in future releases as many people got
> worried about it).
>
>
> The message you get means that your second node cannot contact clvmd and so
> cannot access clustered VG's.
>
>
> issue a ps -ef | grep clvmd to see if it is running ( you may find 2 clvmd
> running , means that things went bad !!)
>
>
> The solution is to restart your second node (reboot, start cman, etc...)
> and check if it goes the right way.
>
>
> Brem
>
>
> 2009/9/10 James Marcinek < jmarc1 at jemconsult.biz >
>
>
> no i didn't. that might be the root cause.
>
> I was able to get rid of it but now I get these errors on the second
> cluster node
>
> connect() failed on local socket: Connection refused
> WARNING: Falling back to local file-based locking.
> Volume Groups with the clustered attribute will be inaccessible
>
> when I do a cman_tool status, I see that there is a Flag indicating: 2node
> Dirty.
>
> Is there a way to clean this up?
>
> I'm trying to get this built so there's no data on anything.
>
> Thanks,
>
> James
>
>
> ----- Original Message -----
> From: "Luis Cerezo" < Luis.Cerezo at pgs.com >
> To: "linux clustering" < linux-cluster at redhat.com >
>
>
>
> Sent: Thursday, September 10, 2009 4:59:16 PM GMT -05:00 US/Canada Eastern
> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
>
> This is shared storage correct?
> have you tried the pvscan/vgscan/lvscan dance?
>
> did you create the vg with the -c y ?
>
> -luis
>
>
> Luis E. Cerezo
> Global IT
> GV: +1 412 223 7396
>
> On Sep 10, 2009, at 3:47 PM, James Marcinek wrote:
>
> > it turns out after my initial issues I turned off clvmd on both
> > nodes. One of them comes up nice but the other hangs... I'm going to
> > boot in runlevel 1 and check my lvm stuff this might be the root
> > cause (hoping) of why it's not they're not becoming members (one
> > sees the phantom lvm and the other does not)
> >
> >
> > ----- Original Message -----
> > From: "Luis Cerezo" < Luis.Cerezo at pgs.com >
> > To: "linux clustering" < linux-cluster at redhat.com >
> > Sent: Thursday, September 10, 2009 3:58:35 PM GMT -05:00 US/Canada
> > Eastern
> > Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
> >
> > you really got to the cluster in quorum before the lvm to work nicely.
> >
> > what is the output of clustat?
> >
> > do you have clvmd on both nodes up and running?
> >
> > did you run pvscan/vgscan/lvscan after initializing the volume?
> >
> > what did vgdisplay say? what it set to not avail etc?
> >
> > -luis
> >
> > Luis E. Cerezo
> > Global IT
> > GV: +1 412 223 7396
> >
> > On Sep 10, 2009, at 2:38 PM, James Marcinek wrote:
> >
> >> I'm running 5.3 and it gave me a locking issue and indicated that it
> >> couldn't create the logical volume. However it showed up and I had
> >> some issues getting rid of it.
> >>
> >> I couldn't get rid of the LVM because it couldn't locate the id. In
> >> the end I rebooted the node and then I could get rid of it...
> >>
> >> I would prefer to use logical volumes if possible. The packages are
> >> there, one of the nodes did have an issue with the clvmd not
> >> starting...
> >>
> >> I'm working on rebuilding the cluster.conf. It kept coming up as in
> >> the config but 'not a member'. When I went into system-config-
> >> cluster on one node it showed up as a member but not the other. Went
> >> to the other node and the same thing, it was a member but not the
> >> other.
> >>
> >> Right now, I've totally scratched the original cluster.conf. I
> >> created a new one and copied it to the second node. started cman and
> >> rgmanager and the same thing they both are in the configs but only
> >> one member in the cluster management tab...
> >>
> >> What's going on?
> >>
> >> ----- Original Message -----
> >> From: "Luis Cerezo" < Luis.Cerezo at pgs.com >
> >> To: "linux clustering" < linux-cluster at redhat.com >
> >> Sent: Thursday, September 10, 2009 3:14:00 PM GMT -05:00 US/Canada
> >> Eastern
> >> Subject: Re: [Linux-cluster] EXT3 or GFS shared disk
> >>
> >> what grief did it give you? also- what version of RHEL are you
> >> running?
> >>
> >> 5.1 has some known issues with clvmd
> >>
> >> -luis
> >>
> >> Luis E. Cerezo
> >> Global IT
> >> GV: +1 412 223 7396
> >>
> >> On Sep 10, 2009, at 12:37 PM, James Marcinek wrote:
> >>
> >>> Hello again,
> >>>
> >>> Next question.
> >>>
> >>> Again since my cluster class (back in '04) GFS wasn't around so I'm
> >>> not sure if I should use this or not in the cluster build...
> >>>
> >>> If I have an active/passive cluster where only one node needs to
> >>> have access to the file system at a given time should I just use an
> >>> ext3 partition or should I use GFS on a logical volume?
> >>>
> >>> I just tried to create an lvm shared logical volume with an ext3
> >>> parition (already did an lvmconf --enable-cluster) but it caused me
> >>> some grief and I switched it to a partition after cleaning up...
> >>>
> >>> Thanks,
> >>>
> >>> James
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >> This e-mail, including any attachments and response string, may
> >> contain proprietary information which is confidential and may be
> >> legally privileged. It is for the intended recipient only. If you
> >> are not the intended recipient or transmission error has misdirected
> >> this e-mail, please notify the author by return e-mail and delete
> >> this message and any attachment immediately. If you are not the
> >> intended recipient you must not use, disclose, distribute, forward,
> >> copy, print or rely on this e-mail in any way except as permitted by
> >> the author.
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > This e-mail, including any attachments and response string, may
> > contain proprietary information which is confidential and may be
> > legally privileged. It is for the intended recipient only. If you
> > are not the intended recipient or transmission error has misdirected
> > this e-mail, please notify the author by return e-mail and delete
> > this message and any attachment immediately. If you are not the
> > intended recipient you must not use, disclose, distribute, forward,
> > copy, print or rely on this e-mail in any way except as permitted by
> > the author.
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> This e-mail, including any attachments and response string, may contain
> proprietary information which is confidential and may be legally privileged.
> It is for the intended recipient only. If you are not the intended recipient
> or transmission error has misdirected this e-mail, please notify the author
> by return e-mail and delete this message and any attachment immediately. If
> you are not the intended recipient you must not use, disclose, distribute,
> forward, copy, print or rely on this e-mail in any way except as permitted
> by the author.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090911/db1219b5/attachment.htm>

From barbos at gmail.com  Thu Sep 10 22:46:26 2009
From: barbos at gmail.com (Alex Kompel)
Date: Thu, 10 Sep 2009 15:46:26 -0700
Subject: [Linux-cluster] 3 node cluster and quorum disk?
In-Reply-To: <1251832781.3209.548.camel@localhost.localdomain>
References: <20090826161128.1e32721c@pc-jsosic.srce.hr>
	<1251832781.3209.548.camel@localhost.localdomain>
Message-ID: <3ae027040909101546v3f8152bdja54a57ae05ab3561@mail.gmail.com>

On Tue, Sep 1, 2009 at 12:19 PM, Lon Hohberger <lhh at redhat.com> wrote:

> On Wed, 2009-08-26 at 16:11 +0200, Jakov Sosic wrote:
> > Hi.
> >
> > I have a situation - when two nodes are up in 3 node cluster, and one
> > node goes down, cluster looses quorate - although I'm using qdiskd...
>
>
> >       <!-- Token -->
> >       <totem token="55000"/>
> >
> >       <!-- Quorum Disk -->
> >       <quorumd interval="5" tko="5" votes="2"
> >       label="SAS-qdisk" status_file="/tmp/qdisk"/>
>
> <cman quorum_dev_poll="55"/>
>
> If that doesn't fix it entirely, get rid of status_file, decrease
> interval, and increase tko.  Try:
>
> interval=2 tko=12 ?
>
>
I had to do some code diving to figure out cluster timeouts. Is this correct
assumption?

qdisk.tko_up*qdisk.interval < qdisk.wait_master*qdisk.interval <
cman.quorum_dev_poll/1000 + qdisk.interval < qdisk.tko*qdisk.interval <
totem.token/1000

-Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090910/49019dc2/attachment.htm>

From alfredo.moralejo at roche.com  Fri Sep 11 09:22:30 2009
From: alfredo.moralejo at roche.com (Moralejo, Alfredo)
Date: Fri, 11 Sep 2009 11:22:30 +0200
Subject: [Linux-cluster] Re: Fencing question in geo cluster (dual sites
	clustering)
In-Reply-To: <29ae894c0909101423t445df409w4a3a97075d6ce172@mail.gmail.com>
References: <29ae894c0908210227r85df80fm173af6452d22a5b2@mail.gmail.com>
	<29ae894c0909101423t445df409w4a3a97075d6ce172@mail.gmail.com>
Message-ID: <C64734E4E1C80E49955AD539DB2FBC3A0263849D@rkamsem703.emea.roche.com>

Hi,

>From my point of view the problem is not so related to what does the "bad node" is down but what happen when communications are restored. Let me explain it.

1. Let's start in a 2 nodes clean cluster, each in a different site. Data duplication is done from the host using md or lvm mirror. There is a service running on each node. Qdisk or third node in a third site for quorum.

2. Communications are lost in site B (where node B runs.). What happens?, not sure, but as my understanding:

           I- Node B will continue working for some time until it realizes it's not quorated (depending on timeouts, let's say 1 minute). Data writes on this time are only written to Disks on site B, modifications not written to disks in site A.
            II- Finally, Node B detects it lost qdisk and detects it's inquorate and rgmanager stops all services running in node B.
            III- In node A, time some time to detect A is dead and never will become inquorate. Services running in node A will continue working, but writes will only be done to disks in disks in site A. Mirror is lost.
            IV- Finally, Node A detects node B is dead and will try to fence it (probably it will need to use manual fence for confirmation).
            V- Until fence is successful, services running originally in node B will not be transferred to node A, so service will be never running simultaneously on both nodes.
            VI- After fence is successful, service starts in node A using disks in site A, without any modification done since the outage until failure is deteced by node B (from I to II). Data modification done from node A are only done to these disks.

3. Communications are restored in site B. At this time node B will join the cluster again. Acces to disks in site B is recovered by node A. At this time mirror should be synchronized from disks in site A to site B always, so that, we have a coherent view of the data in both disks, and changes done from node B in the qdisk timeout (from I to II) will be definitively lost.

I think this the expected behavior for a multisite cluster in this scenario.

Best regards,

Alfredo



________________________________
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of brem belguebli
Sent: Thursday, September 10, 2009 11:23 PM
To: linux clustering
Subject: [Linux-cluster] Re: Fencing question in geo cluster (dual sites clustering)

Hi,

No comments on this RHCS gurus ? Am I trying to setup (multisite cluster) something that 'll never be supported ?

Or is the qdiskd reboot action considered as sufficient?  (Reboot action should be a dirty power reset to prevent data syncing)

If so, all IO's on the wrong nodes (at the isolated site) should be frozen untill quorum is eventually regained. If not it'll end up with a (dirty) reboot.

Brem

2009/8/21 brem belguebli <brem.belguebli at gmail.com<mailto:brem.belguebli at gmail.com>>
Hi,

I'm trying to find out what best fencing solution could fit a dual sites cluster.

Cluster is equally sized on each site (2 nodes/site), each site hosting a SAN array so that each node from any site can see the 2 arrays.

Quorum  disk (iscsi LUN) is hosted on a 3rd site.

SAN and LAN using the same telco infrastructure (2 redundant DWDM loops).

In case something happens at Telco level (both DWDM loops are broken) that makes 1 of the 2 sites completely isolated from the rest of the world,
the nodes at the good site (the one still operationnal) won't be able to fence any node from the wrong site (the one that is isolated) as there is no way for them to reach their ILO's or do any SAN fencing as the switches at the wrong site are no more reachable.

As qdiskd is not reachable from the wrong nodes, they end up being rebooted by  qdisk, but there is a short time (a few seconds) during which the wrong nodes are still seing their local SAN array storage and may potentially have written data on it.

Any ideas or comments on how to ensure data integrity in such setup ?

Regards

Brem

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090911/8d453227/attachment.htm>

From brem.belguebli at gmail.com  Fri Sep 11 12:08:24 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Fri, 11 Sep 2009 14:08:24 +0200
Subject: [Linux-cluster] Re: Fencing question in geo cluster (dual sites 
	clustering)
In-Reply-To: <C64734E4E1C80E49955AD539DB2FBC3A0263849D@rkamsem703.emea.roche.com>
References: <29ae894c0908210227r85df80fm173af6452d22a5b2@mail.gmail.com>
	<29ae894c0909101423t445df409w4a3a97075d6ce172@mail.gmail.com>
	<C64734E4E1C80E49955AD539DB2FBC3A0263849D@rkamsem703.emea.roche.com>
Message-ID: <29ae894c0909110508p44276fc0t5a3a14a56225e3c5@mail.gmail.com>

Hi Alfredo,

As you stated, as all the fencing methods will fail, the last resort will be
manual fencing, but it is not supported.

Another way to do so would have been the suicide, but it is what qdisk does
when communications with the other nodes and qdisk are lost, but that
happens after a short while (a few tens of seconds).

There is one point I do not completely agree with you in 2-I

B doesn't realize it is not quorate after the 1 minute timeout, it just
gives up after this timeout (tko * interval) .
Depending on the  qdisk interval (1s in my case), B already knows it is not
quorate anymore after the first qdisk miss occurs (the interval value).

The concept of having B disk being updated during the short  time before tko
* interval timeout , then A taking over B services on one mirror leg (not up
2 date) at A and the B modifications are just discarded at resynch kind of
disturbs me.

 Let me detail this:

1) node B  hosts a database S made of mirror D for data and mirror L for
logs

2) isolation occurs

3) B continues (or at least is flushing) to write to D and L which are both
half mirrors now (Db and Lb)

4) regions/sectors Bd and Bi on Db and Lb are upated

5) qdisk timeout after interval * tko occurs, node is reset

6) fencing (manual one) succeds (admin ack)

7)  A  takes over S with D and L being both half mirrors (Da and La) (Da !=
Db, La != Lb )

8) regions/sectors Ad and Ai on Da and La are updated

9) outage is over, B is back

10) depending on the mirroring solution, couldn't we end up having  a
partial resynch from B to A concerning Lb and Db and another partial resynch
from A to B concerning La and Da ?

Array based replication wouldn't be this way as there can be only one
replication direction at a time (at leat on the arrays I know), but soft
mirroring (mdadm, lvm mirror etc...) may have such multidirectionnal resynch
capabilites, no ?

Brem








2009/9/11, Moralejo, Alfredo <alfredo.moralejo at roche.com>:
>
>  Hi,
>
>
>
> From my point of view the problem is not so related to what does the ?bad
> node? is down but what happen when communications are restored. Let me
> explain it.
>
>
>
> 1. Let?s start in a 2 nodes clean cluster, each in a different site. Data
> duplication is done from the host using md or lvm mirror. There is a service
> running on each node. Qdisk or third node in a third site for quorum.
>
>
>
> 2. Communications are lost in site B (where node B runs.). What happens?,
> not sure, but as my understanding:
>
>
>
>            I- Node B will continue working for some time until it realizes
> it?s not quorated (depending on timeouts, let?s say 1 minute). Data writes
> on this time are only written to Disks on site B, modifications not written
> to disks in site A.
>
>             II- Finally, Node B detects it lost qdisk and detects it?s
> inquorate and rgmanager stops all services running in node B.
>
>             III- In node A, time some time to detect A is dead and never
> will become inquorate. Services running in node A will continue working, but
> writes will only be done to disks in disks in site A. Mirror is lost.
>
>             IV- Finally, Node A detects node B is dead and will try to
> fence it (probably it will need to use manual fence for confirmation).
>
>             V- Until fence is successful, services running originally in
> node B will not be transferred to node A, so service will be never running
> simultaneously on both nodes.
>
>             VI- After fence is successful, service starts in node A using
> disks in site A, without any modification done since the outage until
> failure is deteced by node B (from I to II). Data modification done from
> node A are only done to these disks.
>
>
>
> 3. Communications are restored in site B. At this time node B will join the
> cluster again. Acces to disks in site B is recovered by node A. At this time
> mirror should be synchronized from disks in site A to site B always, so
> that, we have a coherent view of the data in both disks, and changes done
> from node B in the qdisk timeout (from I to II) will be definitively lost.
>
>
>
> I think this the expected behavior for a multisite cluster in this
> scenario.
>
>
>
> Best regards,
>
>
>
> Alfredo
>
>
>
>
>
>
>  ------------------------------
>
> *From:* linux-cluster-bounces at redhat.com [mailto:
> linux-cluster-bounces at redhat.com] *On Behalf Of *brem belguebli
> *Sent:* Thursday, September 10, 2009 11:23 PM
> *To:* linux clustering
> *Subject:* [Linux-cluster] Re: Fencing question in geo cluster (dual sites
> clustering)
>
>
>
> Hi,
>
>
>
> No comments on this RHCS gurus ? Am I trying to setup (multisite cluster)
> something that 'll never be supported ?
>
>
>
> Or is the qdiskd reboot action considered as sufficient?  (Reboot action
> should be a dirty power reset to prevent data syncing)
>
>
>
> If so, all IO's on the wrong nodes (at the isolated site) should be frozen
> untill quorum is eventually regained. If not it'll end up with a (dirty)
> reboot.
>
>
>
> Brem
>
>
>
> 2009/8/21 brem belguebli <brem.belguebli at gmail.com>
>
> Hi,
>
>
>
> I'm trying to find out what best fencing solution could fit a dual sites
> cluster.
>
>
>
> Cluster is equally sized on each site (2 nodes/site), each site hosting a
> SAN array so that each node from any site can see the 2 arrays.
>
>
>
> Quorum  disk (iscsi LUN) is hosted on a 3rd site.
>
>
>
> SAN and LAN using the same telco infrastructure (2 redundant DWDM loops).
>
>
>
> In case something happens at Telco level (both DWDM loops are broken) that
> makes 1 of the 2 sites completely isolated from the rest of the world,
>
> the nodes at the good site (the one still operationnal) won't be able to
> fence any node from the wrong site (the one that is isolated) as there is no
> way for them to reach their ILO's or do any SAN fencing as the switches at
> the wrong site are no more reachable.
>
>
>
> As qdiskd is not reachable from the wrong nodes, they end up being rebooted
> by  qdisk, but there is a short time (a few seconds) during which the wrong
> nodes are still seing their local SAN array storage and may potentially have
> written data on it.
>
>
>
> Any ideas or comments on how to ensure data integrity in such setup ?
>
>
>
> Regards
>
>
>
> Brem
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090911/6c52764c/attachment.htm>

From edsonmarquezani at gmail.com  Fri Sep 11 20:18:04 2009
From: edsonmarquezani at gmail.com (Edson Marquezani Filho)
Date: Fri, 11 Sep 2009 17:18:04 -0300
Subject: [Linux-cluster] Fencing APC No-breaks
Message-ID: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>

Does anybody know how to setup fencing with APC Smart-UPS no-breaks
and RHCS? I know that there is a daemon called apcupds, but I don't
know what it is able to do, or how to use it with CMAN. (Actualy, I
don't even know if it is possible.)

I just would like to use APC no-breaks as a secondary fencing option
for my servers.

Thank you.



From jeff.sturm at eprize.com  Mon Sep 14 14:17:46 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Mon, 14 Sep 2009 10:17:46 -0400
Subject: [Linux-cluster] Fencing APC No-breaks
In-Reply-To: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>
References: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>
Message-ID: <64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>

For APC products, we are familiar with both the fence_apc and
fence_apc_snmp scripts.  No daemon is required for either, but some
setup in the APC management console may be needed.

The fence_apc script creates an interactive telnet session and attempts
to use the text-based menu interface.  We've found it to be brittle and
hard to support--the SNMP agent is newer and more straightforward.

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Edson Marquezani Filho
> Sent: Friday, September 11, 2009 4:18 PM
> To: linux clustering
> Subject: [Linux-cluster] Fencing APC No-breaks
> 
> Does anybody know how to setup fencing with APC Smart-UPS no-breaks
> and RHCS? I know that there is a daemon called apcupds, but I don't
> know what it is able to do, or how to use it with CMAN. (Actualy, I
> don't even know if it is possible.)
> 
> I just would like to use APC no-breaks as a secondary fencing option
> for my servers.
> 
> Thank you.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster





From Eric.Johnson at mtsallstream.com  Mon Sep 14 15:11:27 2009
From: Eric.Johnson at mtsallstream.com (Johnson, Eric)
Date: Mon, 14 Sep 2009 10:11:27 -0500
Subject: [Linux-cluster] Multiple resource dependencies
Message-ID: <CD9C931A046A4A41876F7F5E134298B4021E8BFE@PTEEXB02.mtsallstream.com>

I know how to make multiple resources dependent on a single, parent
resource, but how is it done the other way around?

For example, I have 5 file system resources, and 2 script resources. In
my service description, I want to make sure that the script resources
are dependent on all 5 fs resources. In other words, the scripts don't
start unless all 5 file systems are mounted, and the file systems don't
attempt to unmount until both script resources have shut down
successfully.

Thanks,
Eric



From Eric.Johnson at mtsallstream.com  Mon Sep 14 16:19:07 2009
From: Eric.Johnson at mtsallstream.com (Johnson, Eric)
Date: Mon, 14 Sep 2009 11:19:07 -0500
Subject: [Linux-cluster] Multiple resource dependencies
In-Reply-To: <1252943926.3161.24.camel@tkonto-lap.localdomain>
References: <CD9C931A046A4A41876F7F5E134298B4021E8BFE@PTEEXB02.mtsallstream.com>
	<1252943926.3161.24.camel@tkonto-lap.localdomain>
Message-ID: <CD9C931A046A4A41876F7F5E134298B4021E8C9E@PTEEXB02.mtsallstream.com>

Hi Theophanis,

Thank you for the link. That answers my question. The default ordering
of rgmanager should handle what I want (file systems before scripts,
scripts done in the order they appear). I won't even need to use
parent/child relationships.

Eric

>________________________________________
>From: Theophanis Kontogiannis
[mailto:theophanis_kontogiannis at yahoo.com] 
>Sent: Monday, September 14, 2009 10:59 AM
>To: linux clustering
>Cc: Johnson, Eric
>Subject: Re: [Linux-cluster] Multiple resource dependencies
>
>Hello Eric and All,
>
>I think what will solve your problem is resource tree ordering in
>cluster.conf.
>
>You can get all the details that will have you started here:
>
>http://sources.redhat.com/cluster/wiki/ResourceTrees
>
>Hope it helps.
>
>Best Regards,
>
>Theophanis Kontogiannis
>
>
>>On Mon, 2009-09-14 at 10:11 -0500, Johnson, Eric wrote: 
>>
>>I know how to make multiple resources dependent on a single, parent
>>resource, but how is it done the other way around?
>>
>>For example, I have 5 file system resources, and 2 script resources.
In
>>my service description, I want to make sure that the script resources
>>are dependent on all 5 fs resources. In other words, the scripts don't
>>start unless all 5 file systems are mounted, and the file systems
don't
>>attempt to unmount until both script resources have shut down
>>successfully.
>>
>>Thanks,
>>Eric




From allen at isye.gatech.edu  Mon Sep 14 16:49:12 2009
From: allen at isye.gatech.edu (Allen Belletti)
Date: Mon, 14 Sep 2009 12:49:12 -0400
Subject: [Linux-cluster] GFS2 lock accumulation
Message-ID: <4AAE7408.2010000@isye.gatech.edu>

Hi All,

I've been running GFS and now GFS2 for several years on a two-node mail 
cluster, generally with good results, especially once GFS2 became 
production ready and we upgraded.  However from time to time (ranging 
from a few days to a month), we'll get a "stuck" lock on one particular 
file or another which them blocks a user from their mail.  While looking 
into this, I've recently become aware of a VERY large number of glocks 
being left behind after our nightly rsync backups.  I'm checking on the 
lock situation with "gfs2_tool lockdump /home" and counting locks by 
piping through "grep ^G | wc -l".  We have two GFS2 filesystems 
mounted.  On one of them, the number of glocks returns to "normal" after 
the backup (currently showing about 5400.)  On the other, it stays very 
high although it will drop somewhat throughout the day.  Currently I am 
seeing over 500,000.  Given the ten minutes or so that it takes to list 
them, this seems like it can't be great for performance.

Most of the locks look like this:

G:  s:SH n:5/b25806 f: t:SH d:EX/0 l:0 a:0 r:3
 H: s:SH f:EH e:0 p:31042 [(ended)] gfs2_inode_lookup+0x114/0x1f0 [gfs2]

Note that the pid (31042 in this case) corresponds to one of the 
completed rsync processes which generated the locks in the first place.

My questions are 1) Is this a bad thing?  My gut feeling is "yes" but 
perhaps the system is highly efficient in dealing with these locks, and 
2) Can anything be done about it?  The tuning opportunities in GFS2 are 
very limited compared to GFS, and the few things I've tried seem to have 
no effect.

By the way, I am running with plock_ownership="1" and 
plock_rate_limit="0" in cluster.conf.

Thanks in advance,
Allen
-- 

Allen Belletti
allen at isye.gatech.edu                             404-894-6221 Phone
Industrial and Systems Engineering                404-385-2988 Fax
Georgia Institute of Technology



From pradhanparas at gmail.com  Mon Sep 14 22:20:24 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Mon, 14 Sep 2009 17:20:24 -0500
Subject: [Linux-cluster] remove , add node
Message-ID: <8b711df40909141520s71c2c78ar86232be130ae993e@mail.gmail.com>

I had to remove a node from the cluster. Now I need to add this node
back again to cluster but the I am getting the error

Log says: Sep 14 17:16:22 cvtst1 ccsd[4973]: Cluster is not quorate.
Refusing connection.

The rest of nodes are doing fine and running services.

I am using conga to add and remove the add.

How do I add my node back to the cluster.

Thanks
Paras.



From jmarc1 at jemconsult.biz  Tue Sep 15 11:57:28 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Tue, 15 Sep 2009 07:57:28 -0400 (EDT)
Subject: [Linux-cluster] quorum disk questions
Message-ID: <236373003.1351253015848569.JavaMail.root@srv01.jemconsult.biz>

Hello all,

I have several clusters which have been built using system-config-cluster.

I would like to now add a quorum disk and possibly a multi-cast address to the cluster as well. Can someone tell me how to go about this using system-config-cluster? I've tried looking over the tool but cannot find these fields.

The clusters aren't in production yet and I would not be adverse to re-creating the cluster.conf file and starting over if necessary... I would want to make sure that I can propagate the changes to the other cluster nodes without causing issues.

A couple of other questions.

-With the initial cluster build, do the cluster nodes heartbeat by default over their service IP or is the quorum disk or multi-cast required to do this?

- Do I need to set 'fence levels' - I've defined a fencing device for the cluster? The state of the cluster indicates that they are not fenced (as I would expect because there are no issues)

Thanks,

James



From fdinitto at redhat.com  Wed Sep 16 09:06:08 2009
From: fdinitto at redhat.com (Fabio Massimo Di Nitto)
Date: Wed, 16 Sep 2009 11:06:08 +0200
Subject: [Linux-cluster] Announce: cluster-cvs mailing list moved to
	fedorahosted
Message-ID: <4AB0AA80.8070809@redhat.com>

Hi all,

in order to address several issues with our mailing list routing 
process, as of NOW, cluster-cvs mailing list will not receive
any new commit message.

I apologize upfront for such a sudden change but it was a necessary step 
to simply the project infrastructure.

The new mailing list is:

https://fedorahosted.org/mailman/listinfo/cluster-commits

You can subscribe to the list from the above URL.

No mass subscription has been done.

For people using List-ID filters, the new ID is: 
cluster-commits.lists.fedorahosted.org

For git committers, all your fedoraproject/fedorahosted email addresses 
are already allowed to send mail to the list (no need to do duplicate 
subscriptions).

Regards
Fabio



From alan.zg at gmail.com  Wed Sep 16 16:14:57 2009
From: alan.zg at gmail.com (Alan A)
Date: Wed, 16 Sep 2009 11:14:57 -0500
Subject: [Linux-cluster] After APC switch was called with fence_node -
	server stays shut down - what can be modified?
Message-ID: <fac531740909160914j1f6dddbbj24ab8db809472273@mail.gmail.com>

I am using HP Proliant DL385s - new ones with Dual Quad Core AMD Opteron(tm)
Processor 2384 and 32G of memory. I am using APC Power Switches for fencing.
When I issue fence_node clusternode1 - sometimes it works like a charm and
sometimes server stays down, like it was shutdown, although power on the APC
switches was reinstated. I can see the green light on the APC Switches but
if the node stays shutdown there is no green light on the Power Supplies.
What could be wrong, what should I check?

-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090916/1f0ee4ad/attachment.htm>

From mad at wol.de  Wed Sep 16 16:26:16 2009
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Wed, 16 Sep 2009 18:26:16 +0200
Subject: [Linux-cluster] After APC switch was called with fence_node -
	server stays shut down - what can be modified?
In-Reply-To: <fac531740909160914j1f6dddbbj24ab8db809472273@mail.gmail.com>
References: <fac531740909160914j1f6dddbbj24ab8db809472273@mail.gmail.com>
Message-ID: <1253118376.20434.97.camel@marc>

Am Mittwoch, den 16.09.2009, 11:14 -0500 schrieb Alan A:
> I am using HP Proliant DL385s - new ones with Dual Quad Core AMD
> Opteron(tm) Processor 2384 and 32G of memory. I am using APC Power
> Switches for fencing. When I issue fence_node clusternode1 - sometimes
> it works like a charm and sometimes server stays down, like it was
> shutdown, although power on the APC switches was reinstated. I can see
> the green light on the APC Switches but if the node stays shutdown
> there is no green light on the Power Supplies. What could be wrong,
> what should I check?

Hello Alan,

this looks like a bios configuration issue. Take look if you can find
anything "power loss" related in bios and change it from "off" to "last
state" or "on". I don't recall what it was called exaclty on the
proliant bios or where you can find that option in bios setup...

Hope this helps.

Marc

OT: You should also make usage of the iLO management devices in this
boxes, they are rock solid fencing devices and react faster than the
apc-power switches (we use DL360s and APC PDUs as fallback fencing
here)...



From alan.zg at gmail.com  Wed Sep 16 16:45:27 2009
From: alan.zg at gmail.com (Alan A)
Date: Wed, 16 Sep 2009 11:45:27 -0500
Subject: [Linux-cluster] After APC switch was called with fence_node - 
	server stays shut down - what can be modified?
In-Reply-To: <1253118376.20434.97.camel@marc>
References: <fac531740909160914j1f6dddbbj24ab8db809472273@mail.gmail.com>
	<1253118376.20434.97.camel@marc>
Message-ID: <fac531740909160945w63ea6356v8d23e4758f8db3b7@mail.gmail.com>

I went through BIOS settings and could not find anything related to power
loss. I did find the option where server is configured to dynamically
operate in low power mode (but this is probably CPU related) and ASR (Auto
System Recovery) was turned off.
So far nothing else jumped at me out of ordinary.

On Wed, Sep 16, 2009 at 11:26 AM, Marc - A. Dahlhaus [ Administration |
Westermann GmbH ] <mad at wol.de> wrote:

> Am Mittwoch, den 16.09.2009, 11:14 -0500 schrieb Alan A:
> > I am using HP Proliant DL385s - new ones with Dual Quad Core AMD
> > Opteron(tm) Processor 2384 and 32G of memory. I am using APC Power
> > Switches for fencing. When I issue fence_node clusternode1 - sometimes
> > it works like a charm and sometimes server stays down, like it was
> > shutdown, although power on the APC switches was reinstated. I can see
> > the green light on the APC Switches but if the node stays shutdown
> > there is no green light on the Power Supplies. What could be wrong,
> > what should I check?
>
> Hello Alan,
>
> this looks like a bios configuration issue. Take look if you can find
> anything "power loss" related in bios and change it from "off" to "last
> state" or "on". I don't recall what it was called exaclty on the
> proliant bios or where you can find that option in bios setup...
>
> Hope this helps.
>
> Marc
>
> OT: You should also make usage of the iLO management devices in this
> boxes, they are rock solid fencing devices and react faster than the
> apc-power switches (we use DL360s and APC PDUs as fallback fencing
> here)...
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090916/4e38dd80/attachment.htm>

From pradhanparas at gmail.com  Wed Sep 16 17:52:28 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 16 Sep 2009 12:52:28 -0500
Subject: [Linux-cluster] openais issue
Message-ID: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>

hi,

I had to reboot one of my nodes in the cluster. After rebooting i
found cman is not working.

I get this error:

Starting cluster:
   Enabling workaround for Xend bridged networking... done
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... failed
/usr/sbin/cman_tool: aisexec daemon didn't start
                                                           [FAILED]


my /etc/hosts is fine , the node can reach other nodes using dns,
packages are of 64 bit .

I can start openais using service openais and i can see the pid for that.

How do i debug whats going on here.


Thanks!
Paras.



From mad at wol.de  Wed Sep 16 19:17:12 2009
From: mad at wol.de (Marc - A. Dahlhaus)
Date: Wed, 16 Sep 2009 21:17:12 +0200
Subject: [Linux-cluster] After APC switch was called with fence_node -
	server stays shut down - what can be modified?
In-Reply-To: <fac531740909160945w63ea6356v8d23e4758f8db3b7@mail.gmail.com>
References: <fac531740909160914j1f6dddbbj24ab8db809472273@mail.gmail.com>	<1253118376.20434.97.camel@marc>
	<fac531740909160945w63ea6356v8d23e4758f8db3b7@mail.gmail.com>
Message-ID: <4AB139B8.7040603@wol.de>

Alan A schrieb:
> I went through BIOS settings and could not find anything related to 
> power loss. I did find the option where server is configured to 
> dynamically operate in low power mode (but this is probably CPU related) 
> and ASR (Auto System Recovery) was turned off.
> So far nothing else jumped at me out of ordinary.
> 
> On Wed, Sep 16, 2009 at 11:26 AM, Marc - A. Dahlhaus [ Administration | 
> Westermann GmbH ] <mad at wol.de <mailto:mad at wol.de>> wrote:
> 
>     Am Mittwoch, den 16.09.2009, 11:14 -0500 schrieb Alan A:
>      > I am using HP Proliant DL385s - new ones with Dual Quad Core AMD
>      > Opteron(tm) Processor 2384 and 32G of memory. I am using APC Power
>      > Switches for fencing. When I issue fence_node clusternode1 -
>     sometimes
>      > it works like a charm and sometimes server stays down, like it was
>      > shutdown, although power on the APC switches was reinstated. I
>     can see
>      > the green light on the APC Switches but if the node stays shutdown
>      > there is no green light on the Power Supplies. What could be wrong,
>      > what should I check?
> 
>     Hello Alan,
> 
>     this looks like a bios configuration issue. Take look if you can find
>     anything "power loss" related in bios and change it from "off" to "last
>     state" or "on". I don't recall what it was called exaclty on the
>     proliant bios or where you can find that option in bios setup...
> 
>     Hope this helps.
> 
>     Marc
> 
>     OT: You should also make usage of the iLO management devices in this
>     boxes, they are rock solid fencing devices and react faster than the
>     apc-power switches (we use DL360s and APC PDUs as fallback fencing
>     here)...

The option should be called "Automatic Power-On" and should be in 
"Server Availability".

Marc



From jeff.sturm at eprize.com  Wed Sep 16 20:38:11 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Wed, 16 Sep 2009 16:38:11 -0400
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
Message-ID: <64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>

Look for syslog output (probably in /var/log/messages) from aisexec.  It
should reveal more information after failing to start.

For cluster use, you don't want to start the "openais" service, ever.
It should be configured off.  (As it uses a different configuration than
cman, the two are mutually exclusive.)

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Paras pradhan
> Sent: Wednesday, September 16, 2009 1:52 PM
> To: linux clustering
> Subject: [Linux-cluster] openais issue
> 
> hi,
> 
> I had to reboot one of my nodes in the cluster. After rebooting i
> found cman is not working.
> 
> I get this error:
> 
> Starting cluster:
>    Enabling workaround for Xend bridged networking... done
>    Loading modules... done
>    Mounting configfs... done
>    Starting ccsd... done
>    Starting cman... failed
> /usr/sbin/cman_tool: aisexec daemon didn't start
>                                                            [FAILED]
> 
> 
> my /etc/hosts is fine , the node can reach other nodes using dns,
> packages are of 64 bit .
> 
> I can start openais using service openais and i can see the pid for
that.
> 
> How do i debug whats going on here.
> 
> 
> Thanks!
> Paras.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster





From pradhanparas at gmail.com  Wed Sep 16 20:45:01 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 16 Sep 2009 15:45:01 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
Message-ID: <8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>

Openais is truned off by default.

syslog says:

Sep 16 15:43:21 cvtst1 ccsd[7160]: Starting ccsd 2.0.115:
Sep 16 15:43:21 cvtst1 ccsd[7160]:  Built: Sep  3 2009 23:26:21
Sep 16 15:43:21 cvtst1 ccsd[7160]:  Copyright (C) Red Hat, Inc.  2004
All rights reserved.
Sep 16 15:43:21 cvtst1 ccsd[7160]: cluster.conf (cluster name = cvtst,
version = 14) found.
Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
rip:2aaaab0bdae1 rsp:7fff5d6d6958 error:0
Sep 16 15:43:50 cvtst1 ccsd[7160]: Unable to connect to cluster
infrastructure after 30 seconds.


Thanks!
Paras.

On Wed, Sep 16, 2009 at 3:38 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
> Look for syslog output (probably in /var/log/messages) from aisexec. ?It
> should reveal more information after failing to start.
>
> For cluster use, you don't want to start the "openais" service, ever.
> It should be configured off. ?(As it uses a different configuration than
> cman, the two are mutually exclusive.)
>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com]
>> On Behalf Of Paras pradhan
>> Sent: Wednesday, September 16, 2009 1:52 PM
>> To: linux clustering
>> Subject: [Linux-cluster] openais issue
>>
>> hi,
>>
>> I had to reboot one of my nodes in the cluster. After rebooting i
>> found cman is not working.
>>
>> I get this error:
>>
>> Starting cluster:
>> ? ?Enabling workaround for Xend bridged networking... done
>> ? ?Loading modules... done
>> ? ?Mounting configfs... done
>> ? ?Starting ccsd... done
>> ? ?Starting cman... failed
>> /usr/sbin/cman_tool: aisexec daemon didn't start
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?[FAILED]
>>
>>
>> my /etc/hosts is fine , the node can reach other nodes using dns,
>> packages are of 64 bit .
>>
>> I can start openais using service openais and i can see the pid for
> that.
>>
>> How do i debug whats going on here.
>>
>>
>> Thanks!
>> Paras.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From alan.zg at gmail.com  Wed Sep 16 20:46:12 2009
From: alan.zg at gmail.com (Alan A)
Date: Wed, 16 Sep 2009 15:46:12 -0500
Subject: [Linux-cluster] After APC switch was called with fence_node - 
	server stays shut down - what can be modified?
In-Reply-To: <4AB139B8.7040603@wol.de>
References: <fac531740909160914j1f6dddbbj24ab8db809472273@mail.gmail.com>
	<1253118376.20434.97.camel@marc>
	<fac531740909160945w63ea6356v8d23e4758f8db3b7@mail.gmail.com>
	<4AB139B8.7040603@wol.de>
Message-ID: <fac531740909161346l68a88b84j1e7d4b3fa4938e19@mail.gmail.com>

I just checked - Automatic Power -on is enabled, Power Delay is disabled,
ASR is disabled. I am open to any other suggestion, and btw when I was ready
to check this info, it did it again. I was acting like shutdown server.

On Wed, Sep 16, 2009 at 2:17 PM, Marc - A. Dahlhaus <mad at wol.de> wrote:

> Alan A schrieb:
>
>> I went through BIOS settings and could not find anything related to power
>> loss. I did find the option where server is configured to dynamically
>> operate in low power mode (but this is probably CPU related) and ASR (Auto
>> System Recovery) was turned off.
>> So far nothing else jumped at me out of ordinary.
>>
>> On Wed, Sep 16, 2009 at 11:26 AM, Marc - A. Dahlhaus [ Administration |
>> Westermann GmbH ] <mad at wol.de <mailto:mad at wol.de>> wrote:
>>
>>    Am Mittwoch, den 16.09.2009, 11:14 -0500 schrieb Alan A:
>>     > I am using HP Proliant DL385s - new ones with Dual Quad Core AMD
>>     > Opteron(tm) Processor 2384 and 32G of memory. I am using APC Power
>>     > Switches for fencing. When I issue fence_node clusternode1 -
>>    sometimes
>>     > it works like a charm and sometimes server stays down, like it was
>>     > shutdown, although power on the APC switches was reinstated. I
>>    can see
>>     > the green light on the APC Switches but if the node stays shutdown
>>     > there is no green light on the Power Supplies. What could be wrong,
>>     > what should I check?
>>
>>    Hello Alan,
>>
>>    this looks like a bios configuration issue. Take look if you can find
>>    anything "power loss" related in bios and change it from "off" to "last
>>    state" or "on". I don't recall what it was called exaclty on the
>>    proliant bios or where you can find that option in bios setup...
>>
>>    Hope this helps.
>>
>>    Marc
>>
>>    OT: You should also make usage of the iLO management devices in this
>>    boxes, they are rock solid fencing devices and react faster than the
>>    apc-power switches (we use DL360s and APC PDUs as fallback fencing
>>    here)...
>>
>
> The option should be called "Automatic Power-On" and should be in "Server
> Availability".
>
> Marc
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090916/1b3ca9fb/attachment.htm>

From volker at ixolution.de  Wed Sep 16 21:30:05 2009
From: volker at ixolution.de (Volker Dormeyer)
Date: Wed, 16 Sep 2009 23:30:05 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
Message-ID: <20090916213005.GA3692@dijkstra>

Hi,

On Wed, Sep 16, 2009 at 03:45:01PM -0500,
Paras pradhan <pradhanparas at gmail.com> wrote:
> syslog says:
> 
> Sep 16 15:43:21 cvtst1 ccsd[7160]: Starting ccsd 2.0.115:
> Sep 16 15:43:21 cvtst1 ccsd[7160]:  Built: Sep  3 2009 23:26:21
> Sep 16 15:43:21 cvtst1 ccsd[7160]:  Copyright (C) Red Hat, Inc.  2004
> All rights reserved.
> Sep 16 15:43:21 cvtst1 ccsd[7160]: cluster.conf (cluster name = cvtst,
> version = 14) found.
> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
> rip:2aaaab0bdae1 rsp:7fff5d6d6958 error:0

I experienced something similiar with aisexec.

aisexec trapped on a RHEL 5.3 system after installation of recent
security patches, resulting in new rgmanager, cmirror, clvmd
and cman packages from RHEL 5.4.

I realised that the installed version of the openais package was
incompatible with the recent cman package and manually installed
the new bug-fix release of openais (from RHEL 5.4), which was
excluded from my former patch session because I choosed security
only. aisexec is running smoothly, since then.

Hope this helps.

Regards,
Volker



From pradhanparas at gmail.com  Wed Sep 16 21:33:50 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 16 Sep 2009 16:33:50 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <20090916213005.GA3692@dijkstra>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
Message-ID: <8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>

I am under CentOS 5.3. Where can I find the rpm you were talking about.

My CMAN screwed up after upgrade as in your case I believe.

Paras

On Wed, Sep 16, 2009 at 4:30 PM, Volker Dormeyer <volker at ixolution.de> wrote:
> Hi,
>
> On Wed, Sep 16, 2009 at 03:45:01PM -0500,
> Paras pradhan <pradhanparas at gmail.com> wrote:
>> syslog says:
>>
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: Starting ccsd 2.0.115:
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: ?Built: Sep ?3 2009 23:26:21
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: ?Copyright (C) Red Hat, Inc. ?2004
>> All rights reserved.
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: cluster.conf (cluster name = cvtst,
>> version = 14) found.
>> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
>> rip:2aaaab0bdae1 rsp:7fff5d6d6958 error:0
>
> I experienced something similiar with aisexec.
>
> aisexec trapped on a RHEL 5.3 system after installation of recent
> security patches, resulting in new rgmanager, cmirror, clvmd
> and cman packages from RHEL 5.4.
>
> I realised that the installed version of the openais package was
> incompatible with the recent cman package and manually installed
> the new bug-fix release of openais (from RHEL 5.4), which was
> excluded from my former patch session because I choosed security
> only. aisexec is running smoothly, since then.
>
> Hope this helps.
>
> Regards,
> Volker
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From sdake at redhat.com  Wed Sep 16 21:50:48 2009
From: sdake at redhat.com (Steven Dake)
Date: Wed, 16 Sep 2009 14:50:48 -0700
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
Message-ID: <1253137848.10645.20.camel@localhost.localdomain>

Possibly an error in your config.  A core file is created
in /var/lib/openais/core.*

INstall the debuginfo package and use

gdb /usr/sbin/aisexec /var/lib/openais/core.number
type 'where' and send the output to the list

regards
-steve

On Wed, 2009-09-16 at 15:45 -0500, Paras pradhan wrote:
> Openais is truned off by default.
> 
> syslog says:
> 
> Sep 16 15:43:21 cvtst1 ccsd[7160]: Starting ccsd 2.0.115:
> Sep 16 15:43:21 cvtst1 ccsd[7160]:  Built: Sep  3 2009 23:26:21
> Sep 16 15:43:21 cvtst1 ccsd[7160]:  Copyright (C) Red Hat, Inc.  2004
> All rights reserved.
> Sep 16 15:43:21 cvtst1 ccsd[7160]: cluster.conf (cluster name = cvtst,
> version = 14) found.
> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
> rip:2aaaab0bdae1 rsp:7fff5d6d6958 error:0
> Sep 16 15:43:50 cvtst1 ccsd[7160]: Unable to connect to cluster
> infrastructure after 30 seconds.
> 
> 
> Thanks!
> Paras.
> 
> On Wed, Sep 16, 2009 at 3:38 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
> > Look for syslog output (probably in /var/log/messages) from aisexec.  It
> > should reveal more information after failing to start.
> >
> > For cluster use, you don't want to start the "openais" service, ever.
> > It should be configured off.  (As it uses a different configuration than
> > cman, the two are mutually exclusive.)
> >
> >> -----Original Message-----
> >> From: linux-cluster-bounces at redhat.com
> > [mailto:linux-cluster-bounces at redhat.com]
> >> On Behalf Of Paras pradhan
> >> Sent: Wednesday, September 16, 2009 1:52 PM
> >> To: linux clustering
> >> Subject: [Linux-cluster] openais issue
> >>
> >> hi,
> >>
> >> I had to reboot one of my nodes in the cluster. After rebooting i
> >> found cman is not working.
> >>
> >> I get this error:
> >>
> >> Starting cluster:
> >>    Enabling workaround for Xend bridged networking... done
> >>    Loading modules... done
> >>    Mounting configfs... done
> >>    Starting ccsd... done
> >>    Starting cman... failed
> >> /usr/sbin/cman_tool: aisexec daemon didn't start
> >>                                                            [FAILED]
> >>
> >>
> >> my /etc/hosts is fine , the node can reach other nodes using dns,
> >> packages are of 64 bit .
> >>
> >> I can start openais using service openais and i can see the pid for
> > that.
> >>
> >> How do i debug whats going on here.
> >>
> >>
> >> Thanks!
> >> Paras.
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jeff.sturm at eprize.com  Wed Sep 16 21:31:24 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Wed, 16 Sep 2009 17:31:24 -0400
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com><64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
Message-ID: <64D0546C5EBBD147B75DE133D798665F03F3EA45@hugo.eprize.local>

This:

> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3

Is very odd.  Are you running the same kernel following the reboot, or did you reboot into a different kernel?

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Paras pradhan
> Sent: Wednesday, September 16, 2009 4:45 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] openais issue
> 
> Openais is truned off by default.
> 
> syslog says:
> 
> Sep 16 15:43:21 cvtst1 ccsd[7160]: Starting ccsd 2.0.115:
> Sep 16 15:43:21 cvtst1 ccsd[7160]:  Built: Sep  3 2009 23:26:21
> Sep 16 15:43:21 cvtst1 ccsd[7160]:  Copyright (C) Red Hat, Inc.  2004
> All rights reserved.
> Sep 16 15:43:21 cvtst1 ccsd[7160]: cluster.conf (cluster name = cvtst,
> version = 14) found.
> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
> rip:2aaaab0bdae1 rsp:7fff5d6d6958 error:0
> Sep 16 15:43:50 cvtst1 ccsd[7160]: Unable to connect to cluster
> infrastructure after 30 seconds.
> 
> 
> Thanks!
> Paras.
> 
> On Wed, Sep 16, 2009 at 3:38 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
> > Look for syslog output (probably in /var/log/messages) from aisexec. ?It
> > should reveal more information after failing to start.
> >
> > For cluster use, you don't want to start the "openais" service, ever.
> > It should be configured off. ?(As it uses a different configuration than
> > cman, the two are mutually exclusive.)
> >
> >> -----Original Message-----
> >> From: linux-cluster-bounces at redhat.com
> > [mailto:linux-cluster-bounces at redhat.com]
> >> On Behalf Of Paras pradhan
> >> Sent: Wednesday, September 16, 2009 1:52 PM
> >> To: linux clustering
> >> Subject: [Linux-cluster] openais issue
> >>
> >> hi,
> >>
> >> I had to reboot one of my nodes in the cluster. After rebooting i
> >> found cman is not working.
> >>
> >> I get this error:
> >>
> >> Starting cluster:
> >> ? ?Enabling workaround for Xend bridged networking... done
> >> ? ?Loading modules... done
> >> ? ?Mounting configfs... done
> >> ? ?Starting ccsd... done
> >> ? ?Starting cman... failed
> >> /usr/sbin/cman_tool: aisexec daemon didn't start
> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?[FAILED]
> >>
> >>
> >> my /etc/hosts is fine , the node can reach other nodes using dns,
> >> packages are of 64 bit .
> >>
> >> I can start openais using service openais and i can see the pid for
> > that.
> >>
> >> How do i debug whats going on here.
> >>
> >>
> >> Thanks!
> >> Paras.
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster





From pradhanparas at gmail.com  Wed Sep 16 22:05:33 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 16 Sep 2009 17:05:33 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <1253137848.10645.20.camel@localhost.localdomain>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<1253137848.10645.20.camel@localhost.localdomain>
Message-ID: <8b711df40909161505y3f538450g59f309f0b61d1f9@mail.gmail.com>

The last core.xxx file was created last month only and here is the o/p
if its useful

[root at cvtst1 openais]# gdb /usr/sbin/aisexec /var/lib/openais/core.3320
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(no debugging symbols found)

warning: exec file is newer than core file.
Cannot access memory at address 0x100080000000b
(gdb) where
#0  0x0000003a3dc30215 in ?? ()
#1  0x0000003a3dc31cc0 in ?? ()
#2  0x0000003a3dd1e3a9 in ?? ()
#3  0x0000003a3dd1f9d6 in ?? ()
#4  0x0000003a3df50860 in ?? ()
#5  0x00002aaaabcfac9c in ?? ()
#6  0x00007ffff543aae0 in ?? ()
#7  0x00002aaaabcfa92b in ?? ()
#8  0x00007ffff543abd0 in ?? ()
#9  0x0000003a3dc60a9b in ?? ()
#10 0x0000003000000018 in ?? ()
#11 0x00007ffff543abe0 in ?? ()
#12 0x00007ffff543ab00 in ?? ()
#13 0x0000003a3dc4d568 in ?? ()
#14 0x0000003000000030 in ?? ()
#15 0x00007ffff543abf8 in ?? ()
#16 0x00000000053e8370 in ?? ()
#17 0x0000000000000020 in ?? ()
#18 0x00000000053e83c0 in ?? ()
#19 0x0000003a3dd18620 in ?? ()
#20 0x00007ffff543dfb2 in ?? ()
#21 0x0000003a3dd1e45d in ?? ()
#22 0x0000000000000020 in ?? ()
#23 0x0000000000000000 in ?? ()
(gdb)

Paras.


On Wed, Sep 16, 2009 at 4:50 PM, Steven Dake <sdake at redhat.com> wrote:
> Possibly an error in your config. ?A core file is created
> in /var/lib/openais/core.*
>
> INstall the debuginfo package and use
>
> gdb /usr/sbin/aisexec /var/lib/openais/core.number
> type 'where' and send the output to the list
>
> regards
> -steve
>
> On Wed, 2009-09-16 at 15:45 -0500, Paras pradhan wrote:
>> Openais is truned off by default.
>>
>> syslog says:
>>
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: Starting ccsd 2.0.115:
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: ?Built: Sep ?3 2009 23:26:21
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: ?Copyright (C) Red Hat, Inc. ?2004
>> All rights reserved.
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: cluster.conf (cluster name = cvtst,
>> version = 14) found.
>> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
>> rip:2aaaab0bdae1 rsp:7fff5d6d6958 error:0
>> Sep 16 15:43:50 cvtst1 ccsd[7160]: Unable to connect to cluster
>> infrastructure after 30 seconds.
>>
>>
>> Thanks!
>> Paras.
>>
>> On Wed, Sep 16, 2009 at 3:38 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
>> > Look for syslog output (probably in /var/log/messages) from aisexec. ?It
>> > should reveal more information after failing to start.
>> >
>> > For cluster use, you don't want to start the "openais" service, ever.
>> > It should be configured off. ?(As it uses a different configuration than
>> > cman, the two are mutually exclusive.)
>> >
>> >> -----Original Message-----
>> >> From: linux-cluster-bounces at redhat.com
>> > [mailto:linux-cluster-bounces at redhat.com]
>> >> On Behalf Of Paras pradhan
>> >> Sent: Wednesday, September 16, 2009 1:52 PM
>> >> To: linux clustering
>> >> Subject: [Linux-cluster] openais issue
>> >>
>> >> hi,
>> >>
>> >> I had to reboot one of my nodes in the cluster. After rebooting i
>> >> found cman is not working.
>> >>
>> >> I get this error:
>> >>
>> >> Starting cluster:
>> >> ? ?Enabling workaround for Xend bridged networking... done
>> >> ? ?Loading modules... done
>> >> ? ?Mounting configfs... done
>> >> ? ?Starting ccsd... done
>> >> ? ?Starting cman... failed
>> >> /usr/sbin/cman_tool: aisexec daemon didn't start
>> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?[FAILED]
>> >>
>> >>
>> >> my /etc/hosts is fine , the node can reach other nodes using dns,
>> >> packages are of 64 bit .
>> >>
>> >> I can start openais using service openais and i can see the pid for
>> > that.
>> >>
>> >> How do i debug whats going on here.
>> >>
>> >>
>> >> Thanks!
>> >> Paras.
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>> >
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Wed Sep 16 22:07:51 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 16 Sep 2009 17:07:51 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F03F3EA45@hugo.eprize.local>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA45@hugo.eprize.local>
Message-ID: <8b711df40909161507n50c54bi92543928a68909c4@mail.gmail.com>

On Wed, Sep 16, 2009 at 4:31 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
> This:
>
>> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
>
> Is very odd. ?Are you running the same kernel following the reboot, or did you reboot into a different kernel?

Different kernel. It did work and tested with the old working kernel
as well. None of them works now. The other nodes are running well but
I have not restarted them yet after the updates. Now I am afraid that
what if they do not comes back normally.

Thanks
Paras.

>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com]
>> On Behalf Of Paras pradhan
>> Sent: Wednesday, September 16, 2009 4:45 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] openais issue
>>
>> Openais is truned off by default.
>>
>> syslog says:
>>
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: Starting ccsd 2.0.115:
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: ?Built: Sep ?3 2009 23:26:21
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: ?Copyright (C) Red Hat, Inc. ?2004
>> All rights reserved.
>> Sep 16 15:43:21 cvtst1 ccsd[7160]: cluster.conf (cluster name = cvtst,
>> version = 14) found.
>> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
>> rip:2aaaab0bdae1 rsp:7fff5d6d6958 error:0
>> Sep 16 15:43:50 cvtst1 ccsd[7160]: Unable to connect to cluster
>> infrastructure after 30 seconds.
>>
>>
>> Thanks!
>> Paras.
>>
>> On Wed, Sep 16, 2009 at 3:38 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
>> > Look for syslog output (probably in /var/log/messages) from aisexec. ?It
>> > should reveal more information after failing to start.
>> >
>> > For cluster use, you don't want to start the "openais" service, ever.
>> > It should be configured off. ?(As it uses a different configuration than
>> > cman, the two are mutually exclusive.)
>> >
>> >> -----Original Message-----
>> >> From: linux-cluster-bounces at redhat.com
>> > [mailto:linux-cluster-bounces at redhat.com]
>> >> On Behalf Of Paras pradhan
>> >> Sent: Wednesday, September 16, 2009 1:52 PM
>> >> To: linux clustering
>> >> Subject: [Linux-cluster] openais issue
>> >>
>> >> hi,
>> >>
>> >> I had to reboot one of my nodes in the cluster. After rebooting i
>> >> found cman is not working.
>> >>
>> >> I get this error:
>> >>
>> >> Starting cluster:
>> >> ? ?Enabling workaround for Xend bridged networking... done
>> >> ? ?Loading modules... done
>> >> ? ?Mounting configfs... done
>> >> ? ?Starting ccsd... done
>> >> ? ?Starting cman... failed
>> >> /usr/sbin/cman_tool: aisexec daemon didn't start
>> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?[FAILED]
>> >>
>> >>
>> >> my /etc/hosts is fine , the node can reach other nodes using dns,
>> >> packages are of 64 bit .
>> >>
>> >> I can start openais using service openais and i can see the pid for
>> > that.
>> >>
>> >> How do i debug whats going on here.
>> >>
>> >>
>> >> Thanks!
>> >> Paras.
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>> >
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From sdake at redhat.com  Thu Sep 17 04:58:42 2009
From: sdake at redhat.com (Steven Dake)
Date: Wed, 16 Sep 2009 21:58:42 -0700
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909161505y3f538450g59f309f0b61d1f9@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<1253137848.10645.20.camel@localhost.localdomain>
	<8b711df40909161505y3f538450g59f309f0b61d1f9@mail.gmail.com>
Message-ID: <1253163522.10645.23.camel@localhost.localdomain>

You need the debuginfo package installed.

On Wed, 2009-09-16 at 17:05 -0500, Paras pradhan wrote:
> The last core.xxx file was created last month only and here is the o/p
> if its useful
> 
> [root at cvtst1 openais]# gdb /usr/sbin/aisexec /var/lib/openais/core.3320
> GNU gdb Fedora (6.8-27.el5)
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...
> (no debugging symbols found)
> 
> warning: exec file is newer than core file.
> Cannot access memory at address 0x100080000000b
> (gdb) where
> #0  0x0000003a3dc30215 in ?? ()
> #1  0x0000003a3dc31cc0 in ?? ()
> #2  0x0000003a3dd1e3a9 in ?? ()
> #3  0x0000003a3dd1f9d6 in ?? ()
> #4  0x0000003a3df50860 in ?? ()
> #5  0x00002aaaabcfac9c in ?? ()
> #6  0x00007ffff543aae0 in ?? ()
> #7  0x00002aaaabcfa92b in ?? ()
> #8  0x00007ffff543abd0 in ?? ()
> #9  0x0000003a3dc60a9b in ?? ()
> #10 0x0000003000000018 in ?? ()
> #11 0x00007ffff543abe0 in ?? ()
> #12 0x00007ffff543ab00 in ?? ()
> #13 0x0000003a3dc4d568 in ?? ()
> #14 0x0000003000000030 in ?? ()
> #15 0x00007ffff543abf8 in ?? ()
> #16 0x00000000053e8370 in ?? ()
> #17 0x0000000000000020 in ?? ()
> #18 0x00000000053e83c0 in ?? ()
> #19 0x0000003a3dd18620 in ?? ()
> #20 0x00007ffff543dfb2 in ?? ()
> #21 0x0000003a3dd1e45d in ?? ()
> #22 0x0000000000000020 in ?? ()
> #23 0x0000000000000000 in ?? ()
> (gdb)
> 
> Paras.
> 
> 
> On Wed, Sep 16, 2009 at 4:50 PM, Steven Dake <sdake at redhat.com> wrote:
> > Possibly an error in your config.  A core file is created
> > in /var/lib/openais/core.*
> >
> > INstall the debuginfo package and use
> >
> > gdb /usr/sbin/aisexec /var/lib/openais/core.number
> > type 'where' and send the output to the list
> >
> > regards
> > -steve
> >
> > On Wed, 2009-09-16 at 15:45 -0500, Paras pradhan wrote:
> >> Openais is truned off by default.
> >>
> >> syslog says:
> >>
> >> Sep 16 15:43:21 cvtst1 ccsd[7160]: Starting ccsd 2.0.115:
> >> Sep 16 15:43:21 cvtst1 ccsd[7160]:  Built: Sep  3 2009 23:26:21
> >> Sep 16 15:43:21 cvtst1 ccsd[7160]:  Copyright (C) Red Hat, Inc.  2004
> >> All rights reserved.
> >> Sep 16 15:43:21 cvtst1 ccsd[7160]: cluster.conf (cluster name = cvtst,
> >> version = 14) found.
> >> Sep 16 15:43:24 cvtst1 kernel: aisexec[7166] trap int3
> >> rip:2aaaab0bdae1 rsp:7fff5d6d6958 error:0
> >> Sep 16 15:43:50 cvtst1 ccsd[7160]: Unable to connect to cluster
> >> infrastructure after 30 seconds.
> >>
> >>
> >> Thanks!
> >> Paras.
> >>
> >> On Wed, Sep 16, 2009 at 3:38 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
> >> > Look for syslog output (probably in /var/log/messages) from aisexec.  It
> >> > should reveal more information after failing to start.
> >> >
> >> > For cluster use, you don't want to start the "openais" service, ever.
> >> > It should be configured off.  (As it uses a different configuration than
> >> > cman, the two are mutually exclusive.)
> >> >
> >> >> -----Original Message-----
> >> >> From: linux-cluster-bounces at redhat.com
> >> > [mailto:linux-cluster-bounces at redhat.com]
> >> >> On Behalf Of Paras pradhan
> >> >> Sent: Wednesday, September 16, 2009 1:52 PM
> >> >> To: linux clustering
> >> >> Subject: [Linux-cluster] openais issue
> >> >>
> >> >> hi,
> >> >>
> >> >> I had to reboot one of my nodes in the cluster. After rebooting i
> >> >> found cman is not working.
> >> >>
> >> >> I get this error:
> >> >>
> >> >> Starting cluster:
> >> >>    Enabling workaround for Xend bridged networking... done
> >> >>    Loading modules... done
> >> >>    Mounting configfs... done
> >> >>    Starting ccsd... done
> >> >>    Starting cman... failed
> >> >> /usr/sbin/cman_tool: aisexec daemon didn't start
> >> >>                                                            [FAILED]
> >> >>
> >> >>
> >> >> my /etc/hosts is fine , the node can reach other nodes using dns,
> >> >> packages are of 64 bit .
> >> >>
> >> >> I can start openais using service openais and i can see the pid for
> >> > that.
> >> >>
> >> >> How do i debug whats going on here.
> >> >>
> >> >>
> >> >> Thanks!
> >> >> Paras.
> >> >>
> >> >> --
> >> >> Linux-cluster mailing list
> >> >> Linux-cluster at redhat.com
> >> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >> >
> >> >
> >> >
> >> > --
> >> > Linux-cluster mailing list
> >> > Linux-cluster at redhat.com
> >> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >> >
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >



From volker at ixolution.de  Thu Sep 17 05:21:34 2009
From: volker at ixolution.de (Volker Dormeyer)
Date: Thu, 17 Sep 2009 07:21:34 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
Message-ID: <20090917052134.GA6003@dijkstra>

Hi,

On Wed, Sep 16, 2009 at 04:33:50PM -0500,
Paras pradhan <pradhanparas at gmail.com> wrote:
> I am under CentOS 5.3. Where can I find the rpm you were talking about.
 
> My CMAN screwed up after upgrade as in your case I believe.

I compared the recent CentOS update repository.

In the CentOS 5.3 update repository, the recent cman package is (from
September, 6th):

       cman-2.0.115-1.el5.x86_64.rpm

Is this the version you have installed?

The matching counterpart for openais is not listet in the CentOS repository.
>From my point of view, this should be:

       openais-0.80.6-8.el5.x86_64.rpm 


Instead the recent version in CentOS is:

       openais-0.80.3-22.el5_3.9.x86_64.rpm


If you have installed the version mentioned above, I would propose to
downgrade your cman to:

       cman-2.0.98-1.el5_3.7.x86_64.rpm


Hope this helps.

Regards,
Volker



From R.Schwierz at physik.tu-dresden.de  Thu Sep 17 05:30:09 2009
From: R.Schwierz at physik.tu-dresden.de (Rainer Schwierz)
Date: Thu, 17 Sep 2009 07:30:09 +0200
Subject: [Linux-cluster] RHCS TestCluster with ScientificLinux 5.2
In-Reply-To: <4A8C1BEF.3010200@physik.tu-dresden.de>
References: <4A8C1BEF.3010200@physik.tu-dresden.de>
Message-ID: <4AB1C961.4020908@physik.tu-dresden.de>

Hello,

hmm, meanwhile the fence_apc problem is fixed by a more recent version 
of fence_apc.

But the nfs lock problem is still open. Does it mean I definitely 
should not use ScientificLinux and switch to Fedora 11 or RHEL5.4?

Cheers, Rainer



Rainer Schwierz wrote:
> Hello experts,
> 
> In preparation of a new production system I have setup a testsystem
> with RHCS under ScientificLinux 5.2.
> It consists of two identical nodes FSC/RX200, a Brocade FibreChannel 
> switch, a FSC/SX80 FibreChannel-raidarray, and a APC-powerswitch.
> The configuration is attached at the end.
> I want to have three (GFS) filesystems
> - exported via nfs to a number of clients, each service has its own IP
> - backup the filesystems via TSM to a TSM-server
> 
> I see some problems I need an explanation/solution:
> 1) if I connect the nfs-clients to the IP of the configured nfs-service
>   started e.g. on tnode02, the filesystem is mounted, but I see a
>   strange lock problem
>     tnode02 kernel: portmap: server "client-IP" not\
>         responding, timed out
>     tnode02 kernel: lockd: server "client-IP" not responding,\
>         timed out
>    It goes away, if I bind the nfs-clients direct to the IP of the
>    the node tnode02. If I start the services on tnode01, it is exactly
>    the same problem, solved by binding the clients direct to tnode01. It
>    does not depend on firewall configuration, it is the same if I switch
>    off iptables on both tnode0[12] and clients.
> 
> 2) tnode02 died with kernel-panic; no real helpfull logs found regarding
>    the panic, I only see a lot of messages regarding problems nfs
>    locking over gfs :
> 
>   kernel: lockd: grant for unknown block
>   kernel: dlm: dlm_plock_callback: lock granted after lock request failed
> 
>   before the kernel paniced, but is this a real reason to panic?
> 
>   At this point tnod01 tried to take over the cluster and to fence
>   tnode02, which gave an error, I do not understand, because fence_apc
>   runnig by hand (On, Off, Status) is properly working
> 
> tnode01 fenced[3127]: fencing node "tnode02.phy.tu-dresden.de"
> tnode01 fenced[3127]: agent "fence_apc" reports: Traceback (most recent 
> call last):   File "/sbin/fence_apc", line 829, in ?     main()   File 
> "/sbin/fence_apc", line 349, in main     do_power_off(sock)   File 
> "/sbin/fence_apc", line 813, in do_power_off     x = 
> do_power_switch(sock, "off")   File "/sbi
> tnode01 fenced[3127]: agent "fence_apc" reports: n/fence_apc", line 611, 
> in do_power_switch     result_code, response = power_off(txt + ndbuf) 
> File "/sbin/fence_apc", line 817, in power_off     x = 
> power_switch(buffer, False, "2", "3");   File "/sbin/fence_apc", line 
> 810, in power_switch     raise "un
> tnode01 fenced[3127]: agent "fence_apc" reports: known screen 
> encountered in \n" + str(lines) + "\n" unknown screen encountered in 
> ['', '> 2', '', '', '------- Configure Outlet 
> ------------------------------------------------------', '', '    # 
> State  Ph  Name                     Pwr On Dly  Pwr Off D
> tnode01 fenced[3127]: agent "fence_apc" reports: ly  Reboot Dur.', ' 
> ----------------------------------------------------------------------------', 
> '    2  ON     1   Outlet 2                 0 sec       0 sec        5 
> sec', '', '     1- Outlet Name         : Outlet 2', '     2- Power On 
> Delay(sec) :
> tnode01 fenced[3127]: agent "fence_apc" reports: 0', '     3- Power Off 
> Delay(sec): 0', '     4- Reboot Duration(sec): 5', '     5- Accept 
> Changes      : ', '', '     ?- Help, <ESC>- Back, <ENTER>- Refresh, 
> <CTRL-L>- Event Log']
> 
>   So tnode01 did not stop fencing tnod02 and so it was not able to take
>   over the cluster services. Via system-config-cluster one was also not
>   able to stop any service. Stopping processes did not really help. The
>   only solution at this point was to power down both nodes and restart
>   the cluster.
> 
> so my questions:
> 
> Is there a solution for the locking problem if one bind the nfs clients 
> to the configured nfs service IP ?
> 
> Is there an explanation/solution of the nfs (dlm) GFS locking problem ?
> 
> Is there a signifivant update to fence_apc I have missed ?
> 
> Why do I have to configure the GFS resources with the "force umount" 
> option?
>   I was under the impression that one can mount GFS filesystems
>   simultanously on a number of nodes. If I define the GFS resources
>   without "force umount", the filesystem is not mounted at all. But
>   running the defined TSM service depends on all mounted filesystems.
> 
> Thanks for any help,  Rainer
> 
> The configuration is
> Scientific Linux SL release 5.2 (Boron)
> kernel 2.6.18-128.4.1.el5 #1 SMP Tue Aug 4 12:51:10 EDT 2009 x86_64 
> x86_64 x86_64 GNU/Linux
> device-mapper-multipath-0.4.7-23.el5_3.2.x86_64
> rgmanager-2.0.38-2.el5_2.1.x86_64
> system-config-cluster-1.0.52-1.1.noarch
> cman-2.0.84-2.el5.x86_64
> kmod-gfs-0.1.23-5.el5_2.4.x86_64
> gfs2-utils-0.1.44-1.el5.x86_64
> gfs-utils-0.1.17-1.el5.x86_64
> lvm2-cluster-2.02.32-4.el5.x86_64
> modcluster-0.12.0-7.el5.x86_64
> ricci-0.12.0-7.el5.x86_64
> openais-0.80.3-15.el5.x86_64
> 
> cluster.conf
> <?xml version="1.0"?>
> <cluster alias="tstw_HA2" config_version="115" name="tstw_HA2">
>         <fence_daemon clean_start="0" post_fail_delay="0" 
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="tnode02.tst.tu-dresden.de" nodeid="1" 
> votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="HA_APC" port="2"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="tnode01.tst.tu-dresden.de" nodeid="2" 
> votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="HA_APC" port="1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_apc" ipaddr="192.168.0.10" 
> login="xxx" name="HA_APC" passwd="yy-xxxx"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="HA_new_failover" 
> ordered="1" restricted="1">
>                                 <failoverdomainnode 
> name="tnode01.tst.tu-dresden.de" priority="1"/>
>                                 <failoverdomainnode 
> name="tnode02.tst.tu-dresden.de" priority="2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <clusterfs device="/dev/VG1/LV00" 
> force_unmount="1" fsid="53422" fstype="gfs" mountpoint="/global_home" 
> name="home_GFS" options=""/>
>                         <nfsexport name="home_nfsexport"/>
>                         <nfsclient name="tstw_home" 
> options="rw,root_squash" path="/global_home" 
> target="tstw*.tst.tu-dresden.de"/>
>                         <ip address="111.22.33.32" monitor_link="1"/>
>                         <ip address="192.168.20.30" monitor_link="1"/>
>                         <nfsclient name="fast_nfs_home_clients" 
> options="rw,root_squash" path="/global_home" target="192.168.20.0/24"/>
>                         <nfsexport name="cluster_nfsexport"/>
>                         <nfsclient name="tstw_cluster" 
> options="no_root_squash,ro" path="/global_cluster" 
> target="tstw*.tst.tu-dresden.de"/>
>                         <nfsclient name="fast_nfs_cluster_clients" 
> options="no_root_squash,ro" path="/global_cluster" 
> target="192.168.20.0/24"/>
>                         <script file="/etc/rc.d/init.d/tsm" 
> name="TSM_backup"/>
>                         <clusterfs device="/dev/VG1/LV10" 
> force_unmount="1" fsid="192" fstype="gfs" mountpoint="/global_cluster" 
> name="cluster_GFS" options=""/>
>                         <clusterfs device="/dev/VG1/LV20" 
> force_unmount="1" fsid="63016" fstype="gfs" mountpoint="/global_soft" 
> name="software_GFS" options=""/>
>                         <nfsexport name="soft_nfsexport"/>
>                         <nfsclient name="tstw_soft" 
> options="rw,root_squash" path="/global_soft" 
> target="tstw*.tst.tu-dresden.de"/>
>                         <nfsclient name="fast_nfs_soft_clients" 
> options="rw,root_squash" path="/global_soft" target="192.168.20.0/24"/>
>                         <nfsclient name="tsts_home" 
> options="no_root_squash,rw" path="/global_home" 
> target="tsts0*.tst.tu-dresden.de"/>
>                         <nfsclient name="tsts_cluster" 
> options="rw,root_squash" path="/global_cluster" 
> target="tsts0*.tst.tu-dresden.de"/>
>                         <nfsclient name="tsts_soft" 
> options="rw,root_squash" path="/global_soft" 
> target="tsts0*.tst.tu-dresden.de"/>
>                         <nfsclient name="tstf_home" 
> options="rw,root_squash" path="/global_home" 
> target="tstf*.tst.tu-dresden.de"/>
>                         <nfsclient name="tstf_cluster" 
> options="rw,root_squash" path="/global_cluster" 
> target="tstf*.tst.tu-dresden.de"/>
>                         <nfsclient name="tstf_soft" 
> options="rw,root_squash" path="/global_soft" 
> target="tstf*.tst.tu-dresden.de"/>
>                         <ip address="111.22.33.31" monitor_link="1"/>
>                         <ip address="111.22.33.30" monitor_link="1"/>
>                         <ip address="192.168.20.31" monitor_link="1"/>
>                         <ip address="192.168.20.32" monitor_link="1"/>
>                         <clusterfs device="/dev/VG1/LV20" 
> force_unmount="0" fsid="11728" fstype="gfs" mountpoint="/global_soft" 
> name="Software_GFS" options=""/>
>                         <clusterfs device="/dev/VG1/LV10" 
> force_unmount="0" fsid="36631" fstype="gfs" mountpoint="/global_cluster" 
> name="Cluster_GFS" options=""/>
>                         <clusterfs device="/dev/VG1/LV00" 
> force_unmount="0" fsid="45816" fstype="gfs" mountpoint="/global_home" 
> name="Home_GFS" options=""/>
>                 </resources>
>                 <service autostart="1" domain="HA_new_failover" 
> name="service_nfs_home">
>                         <nfsexport ref="home_nfsexport"/>
>                         <nfsclient ref="tstw_home"/>
>                         <ip ref="111.22.33.32"/>
>                         <nfsclient ref="tsts_home"/>
>                         <nfsclient ref="tstf_home"/>
>                         <clusterfs ref="home_GFS"/>
>                 </service>
>                 <service autostart="1" domain="HA_new_failover" 
> name="service_nfs_home_fast">
>                         <nfsexport ref="home_nfsexport"/>
>                         <nfsclient ref="fast_nfs_home_clients"/>
>                         <ip ref="192.168.20.32"/>
>                         <clusterfs ref="Home_GFS"/>
>                 </service>
>                 <service autostart="1" domain="HA_new_failover" 
> name="service_nfs_cluster">
>                         <nfsexport ref="cluster_nfsexport"/>
>                         <nfsclient ref="tstw_cluster"/>
>                         <nfsclient ref="tsts_cluster"/>
>                         <nfsclient ref="tstf_cluster"/>
>                         <ip ref="111.22.33.30"/>
>                         <clusterfs ref="cluster_GFS"/>
>                 </service>
>                 <service autostart="1" name="service_nfs_cluster_fast">
>                         <nfsexport ref="cluster_nfsexport"/>
>                         <ip ref="192.168.20.30"/>
>                         <nfsclient ref="fast_nfs_cluster_clients"/>
>                         <clusterfs ref="Cluster_GFS"/>
>                 </service>
>                 <service autostart="1" domain="HA_new_failover" 
> name="service_TSM">
>                         <ip ref="111.22.33.31"/>
>                         <script ref="TSM_backup"/>
>                         <clusterfs ref="Software_GFS"/>
>                         <clusterfs ref="Cluster_GFS"/>
>                         <clusterfs ref="Home_GFS"/>
>                 </service>
>                 <service autostart="1" domain="HA_new_failover" 
> name="service_nfs_soft">
>                         <nfsexport ref="soft_nfsexport"/>
>                         <nfsclient ref="tstw_soft"/>
>                         <nfsclient ref="tsts_soft"/>
>                         <nfsclient ref="tstf_soft"/>
>                         <ip ref="111.22.33.31"/>
>                         <clusterfs ref="software_GFS"/>
>                 </service>
>                 <service autostart="1" domain="HA_new_failover" 
> name="service_nfs_soft_fast">
>                         <nfsexport ref="soft_nfsexport"/>
>                         <nfsclient ref="fast_nfs_soft_clients"/>
>                         <ip ref="192.168.20.31"/>
>                         <clusterfs ref="Software_GFS"/>
>                 </service>
>         </rm>
> </cluster>
> 


-- 
| R.Schwierz at physik.tu-dresden.de                     |
| Rainer  Schwierz, Inst. f. Kern- und Teilchenphysik |
| TU Dresden,       D-01062 Dresden                   |
| Tel. ++49 351 463 32957    FAX ++49 351 463 37292   |
| http://iktp.tu-dresden.de/~schwierz/                |



From ylmzlist at gmail.com  Thu Sep 17 08:32:38 2009
From: ylmzlist at gmail.com (yilmaz)
Date: Thu, 17 Sep 2009 10:32:38 +0200
Subject: [Linux-cluster] ILO fencing works but cluster service isn't
	relocated
Message-ID: <913d550f0909170132t1a863fe3x2ec05b48c51b103@mail.gmail.com>

Hello,
    In my HP blade test system, I have two servers (server1 and server3)
 I have tested fencing with the command:

*/sbin/fence_ilo -a server1-ilo -l admin -p adminpass -o off -v*

and the server1 has been removed from the cluster properly and services are
also relocated. I made another test
and tried removing cluster IP from the first node (server1-cluster)
manually. I saw the messages in server1 that it leaves the cluster and after
that
it shut down itself immediately. However services in server1 aren't
relocated into the server3. When I check the cluster status in server3
I saw the following output.

Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
 SERVER11              none                  started
  WEB11                none                  started


 There was no owner of the service but states were started.
My cluster configuration is attached below. I have redhat AS 4 (update 6)
and ilo firmware version is 1.60 Jul 11 2008
I suppose the services should be relocated into the other node but it didn't
happen.

Thanks for your comments in advance.




<?xml version="1.0" encoding="UTF-8"?>
<cluster config_version="14" name="mmsc">
  <cman expected_votes="1" two_node="1"/>
  <clusternodes>
    <clusternode votes="1" name="server1-cluster">
      <fence>
        <method name="hardware">
          <device hostname="server1-ilo" name="ilo"/>
        </method>
        <method name="last_resort">
          <device ipaddr="server1-cluster" name="last_resort"/>
        </method>
      </fence>
    </clusternode>
    <clusternode votes="1" name="server3-cluster">
      <fence>
        <method name="hardware">
          <device hostname="server3-ilo" name="ilo"/>
        </method>
        <method name="last_resort">
          <device ipaddr="server3-cluster" name="last_resort"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice passwd="adminpass" action="off" login="admin" name="ilo"
agent="fence_ilo"/>
    <fencedevice name="last_resort" agent="fence_manual"/>
  </fencedevices>
  <rm log_facility="uucp" log_level="6">
    <failoverdomains>
      <failoverdomain restricted="1" ordered="1" name="PreferredFirst">
        <failoverdomainnode priority="0" name="server1-cluster"/>
        <failoverdomainnode priority="1" name="server3-cluster"/>
      </failoverdomain>
      <failoverdomain restricted="1" ordered="1" name="PreferredSecond">
        <failoverdomainnode priority="0" name="server3-cluster"/>
        <failoverdomainnode priority="1" name="server1-cluster"/>
      </failoverdomain>
    </failoverdomains>
    <resources/>
    <service domain="PreferredFirst" name="SERVER11" autostart="1"
recovery="relocate">
      <script name="SERVER11" file="/opt/cluster/SERVER11_service.sh"/>
      <ip monitor_link="1" address="172.16.5.227"/>
      <ip monitor_link="1" address="172.16.5.228"/>
      <ip monitor_link="1" address="172.16.5.229"/>
      <fs name="/disk1" force_unmount="1" fstype="ext3" device="/dev/sde1"
mountpoint="/disk1" options="rw,nosuid"/>
    </service>
    <service domain="PreferredFirst" name="WEB11" autostart="1"
recovery="relocate">
      <script name="WEB11" file="/opt/cluster/WEB11_service.sh"/>
      <ip monitor_link="1" address="172.16.5.230"/>
      <ip monitor_link="1" address="172.16.5.231"/>
      <fs name="/web11d" force_umount="1" fstype="ext3" device="/dev/sdf1"
mountpoint="/web11d" options="rw,nosuid"/>
    </service>
  </rm>
  <quorumd votes="2" log_level="6" tko="10" interval="1" label="priquorum"
log_facility="uucp" device="/dev/sda"/>
  <fence_daemon clean_start="1"/>
</cluster>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090917/2e7e4484/attachment.htm>

From jmarc1 at jemconsult.biz  Thu Sep 17 16:30:30 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 17 Sep 2009 12:30:30 -0400 (EDT)
Subject: [Linux-cluster] consistent quorum disk on cluster nodes
Message-ID: <664969789.1531253205029999.JavaMail.root@srv01.jemconsult.biz>

Hello all,

Can anyone point me in the right direction to some examples of defining udev rules for quorum disks (or disks in general). I'm reading in the docs that the quorum disk needs to be the same on all nodes. I have 2 node clusters, which I've allocated a shared 100MB lun that I've created the quorum disk (mkqdisk command); however the devices are not showing up as the same device (eg /dev/sdc on one node and /dev/sdb on the other ). 

If the quorum disk could be managed by lvm that would likely make life easier but I'm not sure that would work or is supported, but I think that would be easier than having to create a udev rule for it...

Thanks,

james



From mad at wol.de  Thu Sep 17 16:46:04 2009
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Thu, 17 Sep 2009 18:46:04 +0200
Subject: [Linux-cluster] consistent quorum disk on cluster nodes
In-Reply-To: <664969789.1531253205029999.JavaMail.root@srv01.jemconsult.biz>
References: <664969789.1531253205029999.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <1253205964.2641.156.camel@marc>

Am Donnerstag, den 17.09.2009, 12:30 -0400 schrieb James Marcinek:
> Hello all,
> 
> Can anyone point me in the right direction to some examples of defining udev rules for quorum disks (or disks in general). I'm reading in the docs that the quorum disk needs to be the same on all nodes. I have 2 node clusters, which I've allocated a shared 100MB lun that I've created the quorum disk (mkqdisk command); however the devices are not showing up as the same device (eg /dev/sdc on one node and /dev/sdb on the other ). 
> 
> If the quorum disk could be managed by lvm that would likely make life easier but I'm not sure that would work or is supported, but I think that would be easier than having to create a udev rule for it...
> 
> Thanks,
> 
> james

You should try to use the LABEL of the qdisk to select it.

man qdiskd

Marc



From jmarc1 at jemconsult.biz  Thu Sep 17 16:53:47 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 17 Sep 2009 12:53:47 -0400 (EDT)
Subject: [Linux-cluster] consistent quorum disk on cluster nodes
In-Reply-To: <1253205964.2641.156.camel@marc>
Message-ID: <1664263533.1561253206427426.JavaMail.root@srv01.jemconsult.biz>

ok,

So I can specify it by the label when I issued the mkqdisk -c /dev/sdx -l quorum command (where x is device)?

Thanks,

James
----- Original Message -----
From: "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 17, 2009 12:46:04 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] consistent quorum disk on cluster nodes

Am Donnerstag, den 17.09.2009, 12:30 -0400 schrieb James Marcinek:
> Hello all,
> 
> Can anyone point me in the right direction to some examples of defining udev rules for quorum disks (or disks in general). I'm reading in the docs that the quorum disk needs to be the same on all nodes. I have 2 node clusters, which I've allocated a shared 100MB lun that I've created the quorum disk (mkqdisk command); however the devices are not showing up as the same device (eg /dev/sdc on one node and /dev/sdb on the other ). 
> 
> If the quorum disk could be managed by lvm that would likely make life easier but I'm not sure that would work or is supported, but I think that would be easier than having to create a udev rule for it...
> 
> Thanks,
> 
> james

You should try to use the LABEL of the qdisk to select it.

man qdiskd

Marc

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From mad at wol.de  Thu Sep 17 17:00:41 2009
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Thu, 17 Sep 2009 19:00:41 +0200
Subject: [Linux-cluster] consistent quorum disk on cluster nodes
In-Reply-To: <1664263533.1561253206427426.JavaMail.root@srv01.jemconsult.biz>
References: <1664263533.1561253206427426.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <1253206841.2641.162.camel@marc>

Am Donnerstag, den 17.09.2009, 12:53 -0400 schrieb James Marcinek:
> ok,
> 
> So I can specify it by the label when I issued the mkqdisk -c /dev/sdx -l quorum command (where x is device)?
> 
> Thanks,

James,

you need to run the mkqdisk command only on one of your boxes but add the desired label to it.
Then read how to add the qdisk configuration to your cluster.conf in the man page of qdisk(5).


Marc



From jmarc1 at jemconsult.biz  Thu Sep 17 22:49:03 2009
From: jmarc1 at jemconsult.biz (James Marcinek)
Date: Thu, 17 Sep 2009 18:49:03 -0400 (EDT)
Subject: [Linux-cluster] consistent quorum disk on cluster nodes
In-Reply-To: <1253206841.2641.162.camel@marc>
Message-ID: <542588928.1661253227743959.JavaMail.root@srv01.jemconsult.biz>

Yes thanks,

I was just put the label in when I defined the quorum disk. All 9 clusters up and running!

Thanks,

James
----- Original Message -----
From: "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 17, 2009 1:00:41 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Linux-cluster] consistent quorum disk on cluster nodes

Am Donnerstag, den 17.09.2009, 12:53 -0400 schrieb James Marcinek:
> ok,
> 
> So I can specify it by the label when I issued the mkqdisk -c /dev/sdx -l quorum command (where x is device)?
> 
> Thanks,

James,

you need to run the mkqdisk command only on one of your boxes but add the desired label to it.
Then read how to add the qdisk configuration to your cluster.conf in the man page of qdisk(5).


Marc

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From sunhux at gmail.com  Fri Sep 18 02:33:04 2009
From: sunhux at gmail.com (sunhux G)
Date: Fri, 18 Sep 2009 10:33:04 +0800
Subject: [Linux-cluster] exact iptables command to stop a source from
	accessing a Linux cluster
Message-ID: <60f08e700909171933h116656ablf58ea7212026472c@mail.gmail.com>

 Hi,

I have a RHEL 5.1  cluster that's constantly being accessed by an
application from a Windows server application via sqlnet (ie Tcp
port 1521) which caused a specific Oracle accounts to be locked.

The owner of the Windows box does not know why the Filenet
application is doing this so while she's doing the research which
configuration in Filenet needs to be fixed to stop this, we need an
interim measure to block this Windows server's access to the cluster.

Thus I would like to set up iptables / firewall on this Linux box to
stop the sqlnet access.  Can someone provide me some example
commands / syntax ?

Source IP address : 10.5.5.25   (Windows server)
Tcp port : 1521
My Linux boxes IP address :  10.5.5.46 / .47
My Linux cluster virtual addr : 10.5.5.45

In fact I would like to block on all ports on the Linux cluster to stop
this Windows server from accessing it.  So what's the exact commands
I should issue on each of the Linux box?  Would iptables also block
the Windows server from accessing the cluster virtual IP addr?


Thanks
U
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090918/a65259ca/attachment.htm>

From cthulhucalling at gmail.com  Fri Sep 18 02:36:42 2009
From: cthulhucalling at gmail.com (Ian Hayes)
Date: Thu, 17 Sep 2009 19:36:42 -0700
Subject: [Linux-cluster] exact iptables command to stop a source from 
	accessing a Linux cluster
In-Reply-To: <60f08e700909171933h116656ablf58ea7212026472c@mail.gmail.com>
References: <60f08e700909171933h116656ablf58ea7212026472c@mail.gmail.com>
Message-ID: <36df569a0909171936n3dec0996uf96342be6a1f672@mail.gmail.com>

iptables -A INPUT -s 10.5.5.25 -j DROP

On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:

>
>  Hi,
>
> I have a RHEL 5.1  cluster that's constantly being accessed by an
> application from a Windows server application via sqlnet (ie Tcp
> port 1521) which caused a specific Oracle accounts to be locked.
>
> The owner of the Windows box does not know why the Filenet
> application is doing this so while she's doing the research which
> configuration in Filenet needs to be fixed to stop this, we need an
> interim measure to block this Windows server's access to the cluster.
>
> Thus I would like to set up iptables / firewall on this Linux box to
> stop the sqlnet access.  Can someone provide me some example
> commands / syntax ?
>
> Source IP address : 10.5.5.25   (Windows server)
> Tcp port : 1521
> My Linux boxes IP address :  10.5.5.46 / .47
> My Linux cluster virtual addr : 10.5.5.45
>
> In fact I would like to block on all ports on the Linux cluster to stop
> this Windows server from accessing it.  So what's the exact commands
> I should issue on each of the Linux box?  Would iptables also block
> the Windows server from accessing the cluster virtual IP addr?
>
>
> Thanks
> U
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090917/6701da76/attachment.htm>

From sunhux at gmail.com  Fri Sep 18 03:22:15 2009
From: sunhux at gmail.com (sunhux G)
Date: Fri, 18 Sep 2009 11:22:15 +0800
Subject: [Linux-cluster] exact iptables command to stop a source from 
	accessing a Linux cluster
In-Reply-To: <36df569a0909171936n3dec0996uf96342be6a1f672@mail.gmail.com>
References: <60f08e700909171933h116656ablf58ea7212026472c@mail.gmail.com>
	<36df569a0909171936n3dec0996uf96342be6a1f672@mail.gmail.com>
Message-ID: <60f08e700909172022k73729a81s38d10eb9024f358c@mail.gmail.com>

Thanks Ian.

So I issue this command on both cluster nodes and it will also
stop access to the virtual cluster address?

What's the command to reverse / remove
" iptables -A INPUT -s 10.5.5.25 -j DROP " ?
Just in case there's a problem, I'll need to reverse.

Tks
U
On Fri, Sep 18, 2009 at 10:36 AM, Ian Hayes <cthulhucalling at gmail.com>wrote:

> iptables -A INPUT -s 10.5.5.25 -j DROP
>
>   On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:
>
>>
>>  Hi,
>>
>> I have a RHEL 5.1  cluster that's constantly being accessed by an
>> application from a Windows server application via sqlnet (ie Tcp
>> port 1521) which caused a specific Oracle accounts to be locked.
>>
>> The owner of the Windows box does not know why the Filenet
>> application is doing this so while she's doing the research which
>> configuration in Filenet needs to be fixed to stop this, we need an
>> interim measure to block this Windows server's access to the cluster.
>>
>> Thus I would like to set up iptables / firewall on this Linux box to
>> stop the sqlnet access.  Can someone provide me some example
>> commands / syntax ?
>>
>> Source IP address : 10.5.5.25   (Windows server)
>> Tcp port : 1521
>> My Linux boxes IP address :  10.5.5.46 / .47
>> My Linux cluster virtual addr : 10.5.5.45
>>
>> In fact I would like to block on all ports on the Linux cluster to stop
>> this Windows server from accessing it.  So what's the exact commands
>> I should issue on each of the Linux box?  Would iptables also block
>> the Windows server from accessing the cluster virtual IP addr?
>>
>>
>> Thanks
>> U
>>
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090918/87275760/attachment.htm>

From cthulhucalling at gmail.com  Fri Sep 18 03:38:25 2009
From: cthulhucalling at gmail.com (Ian Hayes)
Date: Thu, 17 Sep 2009 20:38:25 -0700
Subject: [Linux-cluster] exact iptables command to stop a source from 
	accessing a Linux cluster
In-Reply-To: <60f08e700909172022k73729a81s38d10eb9024f358c@mail.gmail.com>
References: <60f08e700909171933h116656ablf58ea7212026472c@mail.gmail.com>
	<36df569a0909171936n3dec0996uf96342be6a1f672@mail.gmail.com>
	<60f08e700909172022k73729a81s38d10eb9024f358c@mail.gmail.com>
Message-ID: <36df569a0909172038t18280965y10efb75d9a802acb@mail.gmail.com>

[root at cthulhu ~]# iptables -L --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination
1    DROP       all  --  10.5.5.5             anywhere
2    DROP       all  --  10.5.5.6             anywhere
3    DROP       all  --  10.5.5.7             anywhere

Find the rule number that matches the one you want to delete. Say you want
to delete #2 from the INPUT table

[root at cthulhu ~]# iptables -D INPUT 2
[root at cthulhu ~]# iptables -L --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination
1    DROP       all  --  10.5.5.5             anywhere
2    DROP       all  --  10.5.5.7             anywhere


Or you can do iptables -F which will basically drop all your iptables. Make
sure you've saved recently before you do that.

On Thu, Sep 17, 2009 at 8:22 PM, sunhux G <sunhux at gmail.com> wrote:

> Thanks Ian.
>
> So I issue this command on both cluster nodes and it will also
> stop access to the virtual cluster address?
>
> What's the command to reverse / remove
> " iptables -A INPUT -s 10.5.5.25 -j DROP " ?
> Just in case there's a problem, I'll need to reverse.
>
> Tks
> U
> On Fri, Sep 18, 2009 at 10:36 AM, Ian Hayes <cthulhucalling at gmail.com>wrote:
>
>> iptables -A INPUT -s 10.5.5.25 -j DROP
>>
>>   On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:
>>
>>>
>>>  Hi,
>>>
>>> I have a RHEL 5.1  cluster that's constantly being accessed by an
>>> application from a Windows server application via sqlnet (ie Tcp
>>> port 1521) which caused a specific Oracle accounts to be locked.
>>>
>>> The owner of the Windows box does not know why the Filenet
>>> application is doing this so while she's doing the research which
>>> configuration in Filenet needs to be fixed to stop this, we need an
>>> interim measure to block this Windows server's access to the cluster.
>>>
>>> Thus I would like to set up iptables / firewall on this Linux box to
>>> stop the sqlnet access.  Can someone provide me some example
>>> commands / syntax ?
>>>
>>> Source IP address : 10.5.5.25   (Windows server)
>>> Tcp port : 1521
>>> My Linux boxes IP address :  10.5.5.46 / .47
>>> My Linux cluster virtual addr : 10.5.5.45
>>>
>>> In fact I would like to block on all ports on the Linux cluster to stop
>>> this Windows server from accessing it.  So what's the exact commands
>>> I should issue on each of the Linux box?  Would iptables also block
>>> the Windows server from accessing the cluster virtual IP addr?
>>>
>>>
>>> Thanks
>>> U
>>>
>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090917/ebd683b9/attachment.htm>

From sunhux at gmail.com  Fri Sep 18 10:35:23 2009
From: sunhux at gmail.com (sunhux G)
Date: Fri, 18 Sep 2009 18:35:23 +0800
Subject: [Linux-cluster] exact iptables command to stop a source from 
	accessing a Linux cluster
In-Reply-To: <36df569a0909172038t18280965y10efb75d9a802acb@mail.gmail.com>
References: <60f08e700909171933h116656ablf58ea7212026472c@mail.gmail.com>
	<36df569a0909171936n3dec0996uf96342be6a1f672@mail.gmail.com>
	<60f08e700909172022k73729a81s38d10eb9024f358c@mail.gmail.com>
	<36df569a0909172038t18280965y10efb75d9a802acb@mail.gmail.com>
Message-ID: <60f08e700909180335n767535bfjdd39bc43ccd96122@mail.gmail.com>

 I can't even start up iptables as the previous admin hardened it
(but not sure how / where he hardened it)

So despite that I do
service iptables start,
"service iptables status" still show "Firewall is stopped"

Now, can I use /etc/hosts.deny instead ?
Do I need to do "pkill -HUP tcpd"   or
"service xinetd restart"   - which of the two
commands shd I execute & what's the syntax
in /etc/hosts.deny ?

Thanks

On Fri, Sep 18, 2009 at 11:38 AM, Ian Hayes <cthulhucalling at gmail.com>wrote:

> [root at cthulhu ~]# iptables -L --line-numbers
> Chain INPUT (policy ACCEPT)
> num  target     prot opt source               destination
> 1    DROP       all  --  10.5.5.5             anywhere
> 2    DROP       all  --  10.5.5.6             anywhere
> 3    DROP       all  --  10.5.5.7             anywhere
>
> Find the rule number that matches the one you want to delete. Say you want
> to delete #2 from the INPUT table
>
> [root at cthulhu ~]# iptables -D INPUT 2
> [root at cthulhu ~]# iptables -L --line-numbers
> Chain INPUT (policy ACCEPT)
> num  target     prot opt source               destination
> 1    DROP       all  --  10.5.5.5             anywhere
> 2    DROP       all  --  10.5.5.7             anywhere
>
>
> Or you can do iptables -F which will basically drop all your iptables. Make
> sure you've saved recently before you do that.
>
>
> On Thu, Sep 17, 2009 at 8:22 PM, sunhux G <sunhux at gmail.com> wrote:
>
>> Thanks Ian.
>>
>> So I issue this command on both cluster nodes and it will also
>> stop access to the virtual cluster address?
>>
>> What's the command to reverse / remove
>> " iptables -A INPUT -s 10.5.5.25 -j DROP " ?
>> Just in case there's a problem, I'll need to reverse.
>>
>> Tks
>> U
>>   On Fri, Sep 18, 2009 at 10:36 AM, Ian Hayes <cthulhucalling at gmail.com>wrote:
>>
>>> iptables -A INPUT -s 10.5.5.25 -j DROP
>>>
>>>   On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:
>>>
>>>>
>>>>  Hi,
>>>>
>>>> I have a RHEL 5.1  cluster that's constantly being accessed by an
>>>> application from a Windows server application via sqlnet (ie Tcp
>>>> port 1521) which caused a specific Oracle accounts to be locked.
>>>>
>>>> The owner of the Windows box does not know why the Filenet
>>>> application is doing this so while she's doing the research which
>>>> configuration in Filenet needs to be fixed to stop this, we need an
>>>> interim measure to block this Windows server's access to the cluster.
>>>>
>>>> Thus I would like to set up iptables / firewall on this Linux box to
>>>> stop the sqlnet access.  Can someone provide me some example
>>>> commands / syntax ?
>>>>
>>>> Source IP address : 10.5.5.25   (Windows server)
>>>> Tcp port : 1521
>>>> My Linux boxes IP address :  10.5.5.46 / .47
>>>> My Linux cluster virtual addr : 10.5.5.45
>>>>
>>>> In fact I would like to block on all ports on the Linux cluster to stop
>>>> this Windows server from accessing it.  So what's the exact commands
>>>> I should issue on each of the Linux box?  Would iptables also block
>>>> the Windows server from accessing the cluster virtual IP addr?
>>>>
>>>>
>>>> Thanks
>>>> U
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090918/21bba89a/attachment.htm>

From sunhux at gmail.com  Fri Sep 18 10:45:33 2009
From: sunhux at gmail.com (sunhux G)
Date: Fri, 18 Sep 2009 18:45:33 +0800
Subject: [Linux-cluster] exact iptables command to stop a source from 
	accessing a Linux cluster
In-Reply-To: <60f08e700909180335n767535bfjdd39bc43ccd96122@mail.gmail.com>
References: <60f08e700909171933h116656ablf58ea7212026472c@mail.gmail.com>
	<36df569a0909171936n3dec0996uf96342be6a1f672@mail.gmail.com>
	<60f08e700909172022k73729a81s38d10eb9024f358c@mail.gmail.com>
	<36df569a0909172038t18280965y10efb75d9a802acb@mail.gmail.com>
	<60f08e700909180335n767535bfjdd39bc43ccd96122@mail.gmail.com>
Message-ID: <60f08e700909180345i16f69501l561af8774a5e914@mail.gmail.com>

and do I need Tcp wrapper to be running to make use of
this feature to block the Windows server using /etc/hosts.deny
& how to check/enable Tcp wrapper to run in this case?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090918/dfd5a82a/attachment.htm>

From torajveersingh at gmail.com  Fri Sep 18 10:46:57 2009
From: torajveersingh at gmail.com (Rajveer Singh)
Date: Fri, 18 Sep 2009 16:16:57 +0530
Subject: [Linux-cluster] exact iptables command to stop a source from 
	accessing a Linux cluster
In-Reply-To: <60f08e700909180335n767535bfjdd39bc43ccd96122@mail.gmail.com>
References: <60f08e700909171933h116656ablf58ea7212026472c@mail.gmail.com>
	<36df569a0909171936n3dec0996uf96342be6a1f672@mail.gmail.com>
	<60f08e700909172022k73729a81s38d10eb9024f358c@mail.gmail.com>
	<36df569a0909172038t18280965y10efb75d9a802acb@mail.gmail.com>
	<60f08e700909180335n767535bfjdd39bc43ccd96122@mail.gmail.com>
Message-ID: <5f39cb8e0909180346n42c5080dle6813c8193f83d9b@mail.gmail.com>

Dear sunHux,

iptables stores rules in /etc/sysconfig/iptables file by default. So if you
don't have any rules in this file, and try to start iptables service using
"service iptables start" you will see the out of "service iptables status"
as "Firewall is stopped".

So it's not any issue and you can put any iptable rules.

Re,
Raj

On Fri, Sep 18, 2009 at 4:05 PM, sunhux G <sunhux at gmail.com> wrote:

>
>  I can't even start up iptables as the previous admin hardened it
> (but not sure how / where he hardened it)
>
> So despite that I do
> service iptables start,
> "service iptables status" still show "Firewall is stopped"
>
> Now, can I use /etc/hosts.deny instead ?
> Do I need to do "pkill -HUP tcpd"   or
> "service xinetd restart"   - which of the two
> commands shd I execute & what's the syntax
> in /etc/hosts.deny ?
>
> Thanks
>
> On Fri, Sep 18, 2009 at 11:38 AM, Ian Hayes <cthulhucalling at gmail.com>wrote:
>
>> [root at cthulhu ~]# iptables -L --line-numbers
>> Chain INPUT (policy ACCEPT)
>> num  target     prot opt source               destination
>> 1    DROP       all  --  10.5.5.5             anywhere
>> 2    DROP       all  --  10.5.5.6             anywhere
>> 3    DROP       all  --  10.5.5.7             anywhere
>>
>> Find the rule number that matches the one you want to delete. Say you want
>> to delete #2 from the INPUT table
>>
>> [root at cthulhu ~]# iptables -D INPUT 2
>> [root at cthulhu ~]# iptables -L --line-numbers
>> Chain INPUT (policy ACCEPT)
>> num  target     prot opt source               destination
>> 1    DROP       all  --  10.5.5.5             anywhere
>> 2    DROP       all  --  10.5.5.7             anywhere
>>
>>
>> Or you can do iptables -F which will basically drop all your iptables.
>> Make sure you've saved recently before you do that.
>>
>>
>> On Thu, Sep 17, 2009 at 8:22 PM, sunhux G <sunhux at gmail.com> wrote:
>>
>>> Thanks Ian.
>>>
>>> So I issue this command on both cluster nodes and it will also
>>> stop access to the virtual cluster address?
>>>
>>> What's the command to reverse / remove
>>> " iptables -A INPUT -s 10.5.5.25 -j DROP " ?
>>> Just in case there's a problem, I'll need to reverse.
>>>
>>> Tks
>>> U
>>>   On Fri, Sep 18, 2009 at 10:36 AM, Ian Hayes <cthulhucalling at gmail.com>wrote:
>>>
>>>> iptables -A INPUT -s 10.5.5.25 -j DROP
>>>>
>>>>   On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:
>>>>
>>>>>
>>>>>  Hi,
>>>>>
>>>>> I have a RHEL 5.1  cluster that's constantly being accessed by an
>>>>> application from a Windows server application via sqlnet (ie Tcp
>>>>> port 1521) which caused a specific Oracle accounts to be locked.
>>>>>
>>>>> The owner of the Windows box does not know why the Filenet
>>>>> application is doing this so while she's doing the research which
>>>>> configuration in Filenet needs to be fixed to stop this, we need an
>>>>> interim measure to block this Windows server's access to the cluster.
>>>>>
>>>>> Thus I would like to set up iptables / firewall on this Linux box to
>>>>> stop the sqlnet access.  Can someone provide me some example
>>>>> commands / syntax ?
>>>>>
>>>>> Source IP address : 10.5.5.25   (Windows server)
>>>>> Tcp port : 1521
>>>>> My Linux boxes IP address :  10.5.5.46 / .47
>>>>> My Linux cluster virtual addr : 10.5.5.45
>>>>>
>>>>> In fact I would like to block on all ports on the Linux cluster to stop
>>>>> this Windows server from accessing it.  So what's the exact commands
>>>>> I should issue on each of the Linux box?  Would iptables also block
>>>>> the Windows server from accessing the cluster virtual IP addr?
>>>>>
>>>>>
>>>>> Thanks
>>>>> U
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090918/ea117120/attachment.htm>

From sunhux at gmail.com  Fri Sep 18 12:03:09 2009
From: sunhux at gmail.com (sunhux G)
Date: Fri, 18 Sep 2009 20:03:09 +0800
Subject: Summary: [Linux-cluster] exact iptables command to stop a source from
	accessing a Linux cluster
Message-ID: <60f08e700909180503ub50fb16l5f94a59cd3e86834@mail.gmail.com>

Thanks All/Raj, Ok, I miss something so
the following works :

# chkconfig iptables on
# /sbin/iptables -I RH-Firewall-1-INPUT -s 10.5.5.25 -j DROP
# /sbin/service iptables save   <== this will create /etc/sysconfig/iptables
# service iptables restart


Thanks


On Fri, Sep 18, 2009 at 6:46 PM, Rajveer Singh <torajveersingh at gmail.com>wrote:

> Dear sunHux,
>
> iptables stores rules in /etc/sysconfig/iptables file by default. So if you
> don't have any rules in this file, and try to start iptables service using
> "service iptables start" you will see the out of "service iptables status"
> as "Firewall is stopped".
>
> So it's not any issue and you can put any iptable rules.
>
> Re,
> Raj
>
>   On Fri, Sep 18, 2009 at 4:05 PM, sunhux G <sunhux at gmail.com> wrote:
>
>>
>>  I can't even start up iptables as the previous admin hardened it
>> (but not sure how / where he hardened it)
>>
>> So despite that I do
>> service iptables start,
>> "service iptables status" still show "Firewall is stopped"
>>
>> Now, can I use /etc/hosts.deny instead ?
>> Do I need to do "pkill -HUP tcpd"   or
>> "service xinetd restart"   - which of the two
>> commands shd I execute & what's the syntax
>> in /etc/hosts.deny ?
>>
>> Thanks
>>
>> On Fri, Sep 18, 2009 at 11:38 AM, Ian Hayes <cthulhucalling at gmail.com>wrote:
>>
>>> [root at cthulhu ~]# iptables -L --line-numbers
>>> Chain INPUT (policy ACCEPT)
>>> num  target     prot opt source               destination
>>> 1    DROP       all  --  10.5.5.5             anywhere
>>> 2    DROP       all  --  10.5.5.6             anywhere
>>> 3    DROP       all  --  10.5.5.7             anywhere
>>>
>>> Find the rule number that matches the one you want to delete. Say you
>>> want to delete #2 from the INPUT table
>>>
>>> [root at cthulhu ~]# iptables -D INPUT 2
>>> [root at cthulhu ~]# iptables -L --line-numbers
>>> Chain INPUT (policy ACCEPT)
>>> num  target     prot opt source               destination
>>> 1    DROP       all  --  10.5.5.5             anywhere
>>> 2    DROP       all  --  10.5.5.7             anywhere
>>>
>>>
>>> Or you can do iptables -F which will basically drop all your iptables.
>>> Make sure you've saved recently before you do that.
>>>
>>>
>>> On Thu, Sep 17, 2009 at 8:22 PM, sunhux G <sunhux at gmail.com> wrote:
>>>
>>>> Thanks Ian.
>>>>
>>>> So I issue this command on both cluster nodes and it will also
>>>> stop access to the virtual cluster address?
>>>>
>>>> What's the command to reverse / remove
>>>> " iptables -A INPUT -s 10.5.5.25 -j DROP " ?
>>>> Just in case there's a problem, I'll need to reverse.
>>>>
>>>> Tks
>>>> U
>>>>   On Fri, Sep 18, 2009 at 10:36 AM, Ian Hayes <cthulhucalling at gmail.com
>>>> > wrote:
>>>>
>>>>> iptables -A INPUT -s 10.5.5.25 -j DROP
>>>>>
>>>>>   On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>  Hi,
>>>>>>
>>>>>> I have a RHEL 5.1  cluster that's constantly being accessed by an
>>>>>> application from a Windows server application via sqlnet (ie Tcp
>>>>>> port 1521) which caused a specific Oracle accounts to be locked.
>>>>>>
>>>>>> The owner of the Windows box does not know why the Filenet
>>>>>> application is doing this so while she's doing the research which
>>>>>> configuration in Filenet needs to be fixed to stop this, we need an
>>>>>> interim measure to block this Windows server's access to the cluster.
>>>>>>
>>>>>> Thus I would like to set up iptables / firewall on this Linux box to
>>>>>> stop the sqlnet access.  Can someone provide me some example
>>>>>> commands / syntax ?
>>>>>>
>>>>>> Source IP address : 10.5.5.25   (Windows server)
>>>>>> Tcp port : 1521
>>>>>> My Linux boxes IP address :  10.5.5.46 / .47
>>>>>> My Linux cluster virtual addr : 10.5.5.45
>>>>>>
>>>>>> In fact I would like to block on all ports on the Linux cluster to
>>>>>> stop
>>>>>> this Windows server from accessing it.  So what's the exact commands
>>>>>> I should issue on each of the Linux box?  Would iptables also block
>>>>>> the Windows server from accessing the cluster virtual IP addr?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> U
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090918/84f375f3/attachment.htm>

From kkovachev at varna.net  Fri Sep 18 12:33:10 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Fri, 18 Sep 2009 15:33:10 +0300
Subject: [Linux-cluster] gfs2_tool journals
Message-ID: <20090918122021.M81770@varna.net>

Hello,
 gfs2_tool journals /mnt/GFS
returns
 Error mounting GFS2 metafs: Block device required

it gives the same message when i point to the disk itself no mater if it is
mounted or not ... is there something i am missing or there is a bug with the
tool?



From rpeterso at redhat.com  Fri Sep 18 12:44:05 2009
From: rpeterso at redhat.com (Bob Peterson)
Date: Fri, 18 Sep 2009 08:44:05 -0400 (EDT)
Subject: [Linux-cluster] gfs2_tool journals
In-Reply-To: <20090918122021.M81770@varna.net>
Message-ID: <832113104.279271253277845553.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>

----- "Kaloyan Kovachev" <kkovachev at varna.net> wrote:
| Hello,
|  gfs2_tool journals /mnt/GFS
| returns
|  Error mounting GFS2 metafs: Block device required
| 
| it gives the same message when i point to the disk itself no mater if
| it is
| mounted or not ... is there something i am missing or there is a bug
| with the
| tool?
| 
| --
| Linux-cluster mailing list
| Linux-cluster at redhat.com
| https://www.redhat.com/mailman/listinfo/linux-cluster

Hi Kaolyan,

I'm not aware of any bugs and, in fact, it works for me:

[root at roth-01 ~]# mount -tgfs2 /dev/roth_vg/hell /mnt/gfs2
[root at roth-01 ~]# gfs2_tool journals /mnt/gfs2
journal2 - 128MB
journal1 - 128MB
journal0 - 128MB
3 journal(s) found.
[root at roth-01 ~]# 

What version of gfs2-utils are you running?

Regards,

Bob Peterson
Red Hat File Systems



From kkovachev at varna.net  Fri Sep 18 12:59:21 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Fri, 18 Sep 2009 15:59:21 +0300
Subject: [Linux-cluster] gfs2_tool journals
In-Reply-To: <832113104.279271253277845553.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
References: <20090918122021.M81770@varna.net>
	<832113104.279271253277845553.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <20090918125033.M45772@varna.net>

On Fri, 18 Sep 2009 08:44:05 -0400 (EDT), Bob Peterson wrote
> ----- "Kaloyan Kovachev" <kkovachev at varna.net> wrote:
> | Hello,
> |  gfs2_tool journals /mnt/GFS
> | returns
> |  Error mounting GFS2 metafs: Block device required
> | 
> | it gives the same message when i point to the disk itself no mater if
> | it is
> | mounted or not ... is there something i am missing or there is a bug
> | with the
> | tool?
> | 
> | --
> | Linux-cluster mailing list
> | Linux-cluster at redhat.com
> | https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> Hi Kaolyan,
> 
> I'm not aware of any bugs and, in fact, it works for me:
> 

i have 1k block size for the partition, but just confirmed that it is the same
with the defaults ...

> [root at roth-01 ~]# mount -tgfs2 /dev/roth_vg/hell /mnt/gfs2
> [root at roth-01 ~]# gfs2_tool journals /mnt/gfs2
> journal2 - 128MB
> journal1 - 128MB
> journal0 - 128MB
> 3 journal(s) found.
> [root at roth-01 ~]#
> 
> What version of gfs2-utils are you running?

it is from cluster 3.0.2 compiled on a fresh Slackware64-13.0 and the kernel
is vanilla 2.6.30.5 and also tested with the one from cluster 3.0.0 with the
same resuls ... weird but i have just found an even older copy of gfs2_tool
from cluster 2.03.10 and it shows the results correctly ... where can i look
for this message as it not found in the ./gfs2/tool/*

> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From Alain.Moulle at bull.net  Fri Sep 18 13:24:51 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Fri, 18 Sep 2009 15:24:51 +0200
Subject: [Linux-cluster] Pb with dynamic update of ccsd/cluster.conf
Message-ID: <4AB38A23.7090704@bull.net>

Hi,
I'm working with : cman-3.0.0-15.rc1.fc11.x86_64 // 
rgmanager-3.0.0-15.rc1.fc11.x86_64
I wanted that CS  takes in account dynamically the changes in 
cluster.conf. Someone told
me here a few  weeks ago that ccs_sync was automatically done , and that 
I had only
to execute :
   cman_tool version -r <new_version>
but in this case, it is definetely stalled with corosync at top 100% cpu 
(and a clustat does
not work anymore).

Did I misunderstood and did I miss another command to execute before the 
cman_tool ?
Or is there a problem ?

Thanks for help
Alain

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090918/a5d6551d/attachment.htm>

From arwin.tugade at csun.edu  Fri Sep 18 15:46:40 2009
From: arwin.tugade at csun.edu (Arwin L Tugade)
Date: Fri, 18 Sep 2009 08:46:40 -0700
Subject: [Linux-cluster] openais issue
In-Reply-To: <20090917052134.GA6003@dijkstra>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
Message-ID: <6708F96BBF31F846BFA56EC0AE37D62284C295E700@CSUN-EX-V01.csun.edu>

Hi,

Well I had the exact same problem a couple weeks ago.  Keep in mind that my nodes authenticate against ldap.  Starting at version cman-2.0.98-1.el5_3.4,  the user "ais" would be created locally as uidnumber 39 and gidnumber 39 but on my installation that would never succeed because we have a uid in ldap called "ais", revealed in the secure log.  It's probably a fat chance you have the same problem I did, but it's worth a check, do you have a "ais" user in /etc/passwd?

Arwin

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Volker Dormeyer
Sent: Wednesday, September 16, 2009 10:22 PM
To: linux clustering
Subject: Re: [Linux-cluster] openais issue

Hi,

On Wed, Sep 16, 2009 at 04:33:50PM -0500,
Paras pradhan <pradhanparas at gmail.com> wrote:
> I am under CentOS 5.3. Where can I find the rpm you were talking about.
 
> My CMAN screwed up after upgrade as in your case I believe.

I compared the recent CentOS update repository.

In the CentOS 5.3 update repository, the recent cman package is (from
September, 6th):

       cman-2.0.115-1.el5.x86_64.rpm

Is this the version you have installed?

The matching counterpart for openais is not listet in the CentOS repository.
>From my point of view, this should be:

       openais-0.80.6-8.el5.x86_64.rpm 


Instead the recent version in CentOS is:

       openais-0.80.3-22.el5_3.9.x86_64.rpm


If you have installed the version mentioned above, I would propose to
downgrade your cman to:

       cman-2.0.98-1.el5_3.7.x86_64.rpm


Hope this helps.

Regards,
Volker

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From johannes.russek at io-consulting.net  Fri Sep 18 15:52:24 2009
From: johannes.russek at io-consulting.net (jr)
Date: Fri, 18 Sep 2009 17:52:24 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <20090917052134.GA6003@dijkstra>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
Message-ID: <1253289144.4021.0.camel@dell-jr.intern.win-rar.com>

Hi Volker,
i've had the same issue (node wouldn't join cluster after upgrading
centos) with an openais issue too.
I've downgraded cman to cman-2.0.98-1.el5.x86_64.rpm which fixed the
problem.
Thanks!
Johannes

Am Donnerstag, den 17.09.2009, 07:21 +0200 schrieb Volker Dormeyer:
> Hi,
> 
> On Wed, Sep 16, 2009 at 04:33:50PM -0500,
> Paras pradhan <pradhanparas at gmail.com> wrote:
> > I am under CentOS 5.3. Where can I find the rpm you were talking about.
>  
> > My CMAN screwed up after upgrade as in your case I believe.
> 
> I compared the recent CentOS update repository.
> 
> In the CentOS 5.3 update repository, the recent cman package is (from
> September, 6th):
> 
>        cman-2.0.115-1.el5.x86_64.rpm
> 
> Is this the version you have installed?
> 
> The matching counterpart for openais is not listet in the CentOS repository.
> >From my point of view, this should be:
> 
>        openais-0.80.6-8.el5.x86_64.rpm 
> 
> 
> Instead the recent version in CentOS is:
> 
>        openais-0.80.3-22.el5_3.9.x86_64.rpm
> 
> 
> If you have installed the version mentioned above, I would propose to
> downgrade your cman to:
> 
>        cman-2.0.98-1.el5_3.7.x86_64.rpm
> 
> 
> Hope this helps.
> 
> Regards,
> Volker
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From lhh at redhat.com  Fri Sep 18 18:25:59 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 18 Sep 2009 14:25:59 -0400
Subject: [Linux-cluster] RHCS TestCluster with ScientificLinux 5.2
In-Reply-To: <4AB1C961.4020908@physik.tu-dresden.de>
References: <4A8C1BEF.3010200@physik.tu-dresden.de>
	<4AB1C961.4020908@physik.tu-dresden.de>
Message-ID: <1253298359.12193.8.camel@localhost.localdomain>

On Thu, 2009-09-17 at 07:30 +0200, Rainer Schwierz wrote:
> Hello,
> 
> hmm, meanwhile the fence_apc problem is fixed by a more recent version 
> of fence_apc.
> 
> But the nfs lock problem is still open. Does it mean I definitely 
> should not use ScientificLinux and switch to Fedora 11 or RHEL5.4?

When doing a multi-export of the same NFS file system on top of GFS,
lock recovery will not work correctly - there's no way to prevent a new
GFS lock from being taken after a failure but before NFS has sent the
lock reclaim notifications, nor is there a way for GFS to respect the
NFS lock reclaim grace period.

I do not know why you would have this particular problem, though - locks
shouldn't randomly "not work at all" just because you take them from a
service IP address vs. the host's real IP.  Maybe there's some IPtables
firewall rule in place ?


-- Lon



From lhh at redhat.com  Fri Sep 18 18:28:38 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 18 Sep 2009 14:28:38 -0400
Subject: [Linux-cluster] quorum disk questions
In-Reply-To: <236373003.1351253015848569.JavaMail.root@srv01.jemconsult.biz>
References: <236373003.1351253015848569.JavaMail.root@srv01.jemconsult.biz>
Message-ID: <1253298518.12193.11.camel@localhost.localdomain>

On Tue, 2009-09-15 at 07:57 -0400, James Marcinek wrote:
> Hello all,
> 
> I have several clusters which have been built using system-config-cluster.
> 
> I would like to now add a quorum disk and possibly a multi-cast address to the cluster as well. Can someone tell me how to go about this using system-config-cluster? I've tried looking over the tool but cannot find these fields.

Quorum disks are not always necessary. :)  (I don't know the answer to
your question about how to configure it using s-c-c)


> -With the initial cluster build, do the cluster nodes heartbeat by default over their service IP or is the quorum disk or multi-cast required to do this?

By default, openais uses the IP associated with the name in cluster.conf
for multicast traffic.

> - Do I need to set 'fence levels' - I've defined a fencing device for the cluster? The state of the cluster indicates that they are not fenced (as I would expect because there are no issues)

Yes, at least one per node.

-- Lon



From lhh at redhat.com  Fri Sep 18 18:33:03 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 18 Sep 2009 14:33:03 -0400
Subject: [Linux-cluster] Multiple resource dependencies
In-Reply-To: <CD9C931A046A4A41876F7F5E134298B4021E8BFE@PTEEXB02.mtsallstream.com>
References: <CD9C931A046A4A41876F7F5E134298B4021E8BFE@PTEEXB02.mtsallstream.com>
Message-ID: <1253298783.12193.16.camel@localhost.localdomain>

On Mon, 2009-09-14 at 10:11 -0500, Johnson, Eric wrote:
> I know how to make multiple resources dependent on a single, parent
> resource, but how is it done the other way around?
> 
> For example, I have 5 file system resources, and 2 script resources. In
> my service description, I want to make sure that the script resources
> are dependent on all 5 fs resources. In other words, the scripts don't
> start unless all 5 file systems are mounted, and the file systems don't
> attempt to unmount until both script resources have shut down
> successfully.

rgmanager has explicit ordering guarantees wrt file systems and scripts;
look in /usr/share/cluster/service.sh or at:

http://sources.redhat.com/cluster/wiki/ResourceTrees


-- Lon



From pradhanparas at gmail.com  Fri Sep 18 22:08:57 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 18 Sep 2009 17:08:57 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
Message-ID: <8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>

Downgarding cman works !.

I am using cluster suite for HA of xen virtual machines. Now I am
having another problem. When I start the my xen vm in one node, it
also starts on other nodes. Which daemon controls  this?


Thanks
Paras.

On Fri, Sep 18, 2009 at 10:52 AM, jr <johannes.russek at io-consulting.net> wrote:
> Hi Volker,
> i've had the same issue (node wouldn't join cluster after upgrading
> centos) with an openais issue too.
> I've downgraded cman to cman-2.0.98-1.el5.x86_64.rpm which fixed the
> problem.
> Thanks!
> Johannes
>
> Am Donnerstag, den 17.09.2009, 07:21 +0200 schrieb Volker Dormeyer:
>> Hi,
>>
>> On Wed, Sep 16, 2009 at 04:33:50PM -0500,
>> Paras pradhan <pradhanparas at gmail.com> wrote:
>> > I am under CentOS 5.3. Where can I find the rpm you were talking about.
>>
>> > My CMAN screwed up after upgrade as in your case I believe.
>>
>> I compared the recent CentOS update repository.
>>
>> In the CentOS 5.3 update repository, the recent cman package is (from
>> September, 6th):
>>
>> ? ? ? ?cman-2.0.115-1.el5.x86_64.rpm
>>
>> Is this the version you have installed?
>>
>> The matching counterpart for openais is not listet in the CentOS repository.
>> >From my point of view, this should be:
>>
>> ? ? ? ?openais-0.80.6-8.el5.x86_64.rpm
>>
>>
>> Instead the recent version in CentOS is:
>>
>> ? ? ? ?openais-0.80.3-22.el5_3.9.x86_64.rpm
>>
>>
>> If you have installed the version mentioned above, I would propose to
>> downgrade your cman to:
>>
>> ? ? ? ?cman-2.0.98-1.el5_3.7.x86_64.rpm
>>
>>
>> Hope this helps.
>>
>> Regards,
>> Volker
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From cleber.lists at gmail.com  Sat Sep 19 00:33:08 2009
From: cleber.lists at gmail.com (Cleber Souza)
Date: Fri, 18 Sep 2009 21:33:08 -0300
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
Message-ID: <bd2415a70909181733x270d0351kce58ad9055c11039@mail.gmail.com>

I had the same issue with a full updated CentOS 5.3. Downgrading cman
was the only possible way to have cluster services working again.


On Fri, Sep 18, 2009 at 7:08 PM, Paras pradhan <pradhanparas at gmail.com> wrote:
> Downgarding cman works !.
>
> I am using cluster suite for HA of xen virtual machines. Now I am
> having another problem. When I start the my xen vm in one node, it
> also starts on other nodes. Which daemon controls ?this?
>
>
> Thanks
> Paras.
>
> On Fri, Sep 18, 2009 at 10:52 AM, jr <johannes.russek at io-consulting.net> wrote:
>> Hi Volker,
>> i've had the same issue (node wouldn't join cluster after upgrading
>> centos) with an openais issue too.
>> I've downgraded cman to cman-2.0.98-1.el5.x86_64.rpm which fixed the
>> problem.
>> Thanks!
>> Johannes
>>
>> Am Donnerstag, den 17.09.2009, 07:21 +0200 schrieb Volker Dormeyer:
>>> Hi,
>>>
>>> On Wed, Sep 16, 2009 at 04:33:50PM -0500,
>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>> > I am under CentOS 5.3. Where can I find the rpm you were talking about.
>>>
>>> > My CMAN screwed up after upgrade as in your case I believe.
>>>
>>> I compared the recent CentOS update repository.
>>>
>>> In the CentOS 5.3 update repository, the recent cman package is (from
>>> September, 6th):
>>>
>>> ? ? ? ?cman-2.0.115-1.el5.x86_64.rpm
>>>
>>> Is this the version you have installed?
>>>
>>> The matching counterpart for openais is not listet in the CentOS repository.
>>> >From my point of view, this should be:
>>>
>>> ? ? ? ?openais-0.80.6-8.el5.x86_64.rpm
>>>
>>>
>>> Instead the recent version in CentOS is:
>>>
>>> ? ? ? ?openais-0.80.3-22.el5_3.9.x86_64.rpm
>>>
>>>
>>> If you have installed the version mentioned above, I would propose to
>>> downgrade your cman to:
>>>
>>> ? ? ? ?cman-2.0.98-1.el5_3.7.x86_64.rpm
>>>
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Volker
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Cleber Paiva de Souza



From kai at fiber.net  Sat Sep 19 11:16:30 2009
From: kai at fiber.net (Kai Meyer)
Date: Sat, 19 Sep 2009 05:16:30 -0600
Subject: [Linux-cluster] GFS2 fatal: invalid metadata block
Message-ID: <4AB4BD8E.9060905@fiber.net>

I have a 5 node cluster running kernel 2.6.18-128.1.6.el5xen and 
gfs2-utils-0.1.53-1.el5_3.3 . Twice in 10 days, each node in my cluster 
has failed with the same message in /var/log/messages. dmesg reports the 
same errors, and on some nodes there are no other entries previous to 
the invalid metadata block error.

I would like to know what issues can trigger such an event. If it is 
more helpful for me to provide more information, I will be happy to, I'm 
just not sure what other information you would consider relevant.

Thank you for your time,
-Kai Meyer

Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
fsid=xencluster1:xenclusterfs1.1: fatal: invalid metadata block
Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
fsid=xencluster1:xenclusterfs1.1:   bh = 567447963 (magic number)
Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
fsid=xencluster1:xenclusterfs1.1:   function = 
gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line
= 334
Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
fsid=xencluster1:xenclusterfs1.1: about to withdraw this file system
Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
fsid=xencluster1:xenclusterfs1.1: telling LM to withdraw
Sep 19 02:02:07 192.168.100.104 kernel: GFS2: 
fsid=xencluster1:xenclusterfs1.1: withdrawn
Sep 19 02:02:07 192.168.100.104 kernel: 
Sep 19 02:02:07 192.168.100.104 kernel: Call Trace:
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885154ce>] 
:gfs2:gfs2_lm_withdraw+0xc1/0xd0
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262907>] 
__wait_on_bit+0x60/0x6e
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80215788>] 
sync_buffer+0x0/0x3f
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262981>] 
out_of_line_wait_on_bit+0x6c/0x78
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8029a01a>] 
wake_bit_function+0x0/0x23
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a7f1>] 
submit_bh+0x10a/0x111
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885284a7>] 
:gfs2:gfs2_meta_check_ii+0x2c/0x38
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88518d30>] 
:gfs2:gfs2_meta_indirect_buffer+0x104/0x160
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88509fc3>] 
:gfs2:gfs2_block_map+0x1dc/0x33e
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a821>] 
poll_freewait+0x29/0x6a
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a199>] 
:gfs2:gfs2_extent_map+0x74/0xac
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a2ce>] 
:gfs2:gfs2_write_alloc_required+0xfd/0x122
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885128d5>] 
:gfs2:gfs2_glock_nq+0x248/0x273
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851a27c>] 
:gfs2:gfs2_write_begin+0x99/0x36a
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851bd1b>] 
:gfs2:gfs2_file_buffered_write+0x14b/0x2e5
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8020d3a5>] 
file_read_actor+0x0/0xfc
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c151>] 
:gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c2f4>] 
:gfs2:gfs2_file_write_nolock+0xaa/0x10f
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022eca0>] 
__wake_up+0x38/0x4f
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80299fec>] 
autoremove_wake_function+0x0/0x2e
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022fbe4>] 
pipe_readv+0x38e/0x3a2
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80263bce>] 
lock_kernel+0x1b/0x32
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c444>] 
:gfs2:gfs2_file_write+0x49/0xa7
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80216da9>] 
vfs_write+0xce/0x174
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff802175e1>] 
sys_write+0x45/0x6e
Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8025f2f9>] 
tracesys+0xab/0xb6



From pradhanparas at gmail.com  Sat Sep 19 18:56:26 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Sat, 19 Sep 2009 13:56:26 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <bd2415a70909181733x270d0351kce58ad9055c11039@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<bd2415a70909181733x270d0351kce58ad9055c11039@mail.gmail.com>
Message-ID: <8b711df40909191156mbf8d1faqb2c4dac015816a97@mail.gmail.com>

Yes I did the downgrade. cman works fine now, cluster is running and
can see other nodes without any problem. But virtual machine service
handling is broken. When I start the VM, it also starts on another
nodes. Do I need to downgrade any other packages ?


Thanks
Paras.

On Fri, Sep 18, 2009 at 7:33 PM, Cleber Souza <cleber.lists at gmail.com> wrote:
> I had the same issue with a full updated CentOS 5.3. Downgrading cman
> was the only possible way to have cluster services working again.
>
>
> On Fri, Sep 18, 2009 at 7:08 PM, Paras pradhan <pradhanparas at gmail.com> wrote:
>> Downgarding cman works !.
>>
>> I am using cluster suite for HA of xen virtual machines. Now I am
>> having another problem. When I start the my xen vm in one node, it
>> also starts on other nodes. Which daemon controls ?this?
>>
>>
>> Thanks
>> Paras.
>>
>> On Fri, Sep 18, 2009 at 10:52 AM, jr <johannes.russek at io-consulting.net> wrote:
>>> Hi Volker,
>>> i've had the same issue (node wouldn't join cluster after upgrading
>>> centos) with an openais issue too.
>>> I've downgraded cman to cman-2.0.98-1.el5.x86_64.rpm which fixed the
>>> problem.
>>> Thanks!
>>> Johannes
>>>
>>> Am Donnerstag, den 17.09.2009, 07:21 +0200 schrieb Volker Dormeyer:
>>>> Hi,
>>>>
>>>> On Wed, Sep 16, 2009 at 04:33:50PM -0500,
>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>> > I am under CentOS 5.3. Where can I find the rpm you were talking about.
>>>>
>>>> > My CMAN screwed up after upgrade as in your case I believe.
>>>>
>>>> I compared the recent CentOS update repository.
>>>>
>>>> In the CentOS 5.3 update repository, the recent cman package is (from
>>>> September, 6th):
>>>>
>>>> ? ? ? ?cman-2.0.115-1.el5.x86_64.rpm
>>>>
>>>> Is this the version you have installed?
>>>>
>>>> The matching counterpart for openais is not listet in the CentOS repository.
>>>> >From my point of view, this should be:
>>>>
>>>> ? ? ? ?openais-0.80.6-8.el5.x86_64.rpm
>>>>
>>>>
>>>> Instead the recent version in CentOS is:
>>>>
>>>> ? ? ? ?openais-0.80.3-22.el5_3.9.x86_64.rpm
>>>>
>>>>
>>>> If you have installed the version mentioned above, I would propose to
>>>> downgrade your cman to:
>>>>
>>>> ? ? ? ?cman-2.0.98-1.el5_3.7.x86_64.rpm
>>>>
>>>>
>>>> Hope this helps.
>>>>
>>>> Regards,
>>>> Volker
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Cleber Paiva de Souza
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From cleber.lists at gmail.com  Sun Sep 20 02:51:09 2009
From: cleber.lists at gmail.com (Cleber Souza)
Date: Sat, 19 Sep 2009 23:51:09 -0300
Subject: [Linux-cluster] GFS2 fatal: invalid metadata block
In-Reply-To: <4AB4BD8E.9060905@fiber.net>
References: <4AB4BD8E.9060905@fiber.net>
Message-ID: <bd2415a70909191951i374a2a55nf95490f603536196@mail.gmail.com>

I have had the same problem using the GFS2. Have a look at the
bugzilla I reported in
https://bugzilla.redhat.com/show_bug.cgi?id=520720.
In my scenario I was using GFS2 exported through a Samba server, and
even NFS had the same problem.

Temporarily I use only GFS v1 for this mounts and got no error until now.


On Sat, Sep 19, 2009 at 8:16 AM, Kai Meyer <kai at fiber.net> wrote:
> I have a 5 node cluster running kernel 2.6.18-128.1.6.el5xen and
> gfs2-utils-0.1.53-1.el5_3.3 . Twice in 10 days, each node in my cluster has
> failed with the same message in /var/log/messages. dmesg reports the same
> errors, and on some nodes there are no other entries previous to the invalid
> metadata block error.
>
> I would like to know what issues can trigger such an event. If it is more
> helpful for me to provide more information, I will be happy to, I'm just not
> sure what other information you would consider relevant.
>
> Thank you for your time,
> -Kai Meyer
>
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2:
> fsid=xencluster1:xenclusterfs1.1: fatal: invalid metadata block
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2:
> fsid=xencluster1:xenclusterfs1.1: ? bh = 567447963 (magic number)
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2:
> fsid=xencluster1:xenclusterfs1.1: ? function = gfs2_meta_indirect_buffer,
> file = fs/gfs2/meta_io.c, line
> = 334
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2:
> fsid=xencluster1:xenclusterfs1.1: about to withdraw this file system
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2:
> fsid=xencluster1:xenclusterfs1.1: telling LM to withdraw
> Sep 19 02:02:07 192.168.100.104 kernel: GFS2:
> fsid=xencluster1:xenclusterfs1.1: withdrawn
> Sep 19 02:02:07 192.168.100.104 kernel: Sep 19 02:02:07 192.168.100.104
> kernel: Call Trace:
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff885154ce>]
> :gfs2:gfs2_lm_withdraw+0xc1/0xd0
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff80262907>]
> __wait_on_bit+0x60/0x6e
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff80215788>]
> sync_buffer+0x0/0x3f
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff80262981>]
> out_of_line_wait_on_bit+0x6c/0x78
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8029a01a>]
> wake_bit_function+0x0/0x23
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8021a7f1>]
> submit_bh+0x10a/0x111
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff885284a7>]
> :gfs2:gfs2_meta_check_ii+0x2c/0x38
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff88518d30>]
> :gfs2:gfs2_meta_indirect_buffer+0x104/0x160
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff88509fc3>]
> :gfs2:gfs2_block_map+0x1dc/0x33e
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8021a821>]
> poll_freewait+0x29/0x6a
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8850a199>]
> :gfs2:gfs2_extent_map+0x74/0xac
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8850a2ce>]
> :gfs2:gfs2_write_alloc_required+0xfd/0x122
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff885128d5>]
> :gfs2:gfs2_glock_nq+0x248/0x273
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8851a27c>]
> :gfs2:gfs2_write_begin+0x99/0x36a
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8851bd1b>]
> :gfs2:gfs2_file_buffered_write+0x14b/0x2e5
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8020d3a5>]
> file_read_actor+0x0/0xfc
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8851c151>]
> :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8851c2f4>]
> :gfs2:gfs2_file_write_nolock+0xaa/0x10f
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8022eca0>]
> __wake_up+0x38/0x4f
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff80299fec>]
> autoremove_wake_function+0x0/0x2e
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8022fbe4>]
> pipe_readv+0x38e/0x3a2
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff80263bce>]
> lock_kernel+0x1b/0x32
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8851c444>]
> :gfs2:gfs2_file_write+0x49/0xa7
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff80216da9>]
> vfs_write+0xce/0x174
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff802175e1>]
> sys_write+0x45/0x6e
> Sep 19 02:02:07 192.168.100.104 kernel: ?[<ffffffff8025f2f9>]
> tracesys+0xab/0xb6
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Cleber Paiva de Souza



From linux-cluster at lists.grepular.com  Sun Sep 20 14:07:55 2009
From: linux-cluster at lists.grepular.com (Mike Cardwell)
Date: Sun, 20 Sep 2009 15:07:55 +0100
Subject: [Linux-cluster] cman fails after a yum upgrade
Message-ID: <4AB6373B.5090008@lists.grepular.com>

Hi,

I just did a yum upgrade on my test cluster running on Centos 5.3, and 
now when I start cman, the following happens:

[root at web1 cluster]# service cman start
Starting cluster:
    Loading modules... done
    Mounting configfs... done
    Starting ccsd... done
    Starting cman... failed
/usr/sbin/cman_tool: aisexec daemon didn't start
                                                            [FAILED]
[root at web1 cluster]#

I did the upgrade on a second node also, and the same started happening 
there. The upgrade, upgraded cman, openais and the kernel along with 
some unrelated packages. There is nothing in /var/log/messages which 
suggests that a failure has occurred.

I've never used the gui stuff for configuring cluster before, so I 
thought I'd install X11 on the server and run system-config-cluster. 
When I run that, the error message I get is:

Relax-NG validity error : Extra element fencedevices in interleave
/etc/cluster/cluster.conf:29: element fencedevices: Relax-NG validity 
error : Element cluster failed to validate content
/etc/cluster/cluster.conf:23: element device: validity error : IDREF 
attribute name references an unknown ID "scsi2"
/etc/cluster/cluster.conf fails to validate

I've looked over the config file, and I can't see what is wrong with it, 
if anything. I've put it up at https://secure.grepular.com/cluster.conf

Anyone have any idea what could be wrong?

-- 
Mike Cardwell - IT Consultant and LAMP developer
Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/



From volker at ixolution.de  Sun Sep 20 14:40:52 2009
From: volker at ixolution.de (Volker Dormeyer)
Date: Sun, 20 Sep 2009 16:40:52 +0200
Subject: [Linux-cluster] cman fails after a yum upgrade
In-Reply-To: <4AB6373B.5090008@lists.grepular.com>
References: <4AB6373B.5090008@lists.grepular.com>
Message-ID: <20090920144052.GB4922@dijkstra>

Hi,

On Sun, Sep 20, 2009 at 03:07:55PM +0100,
Mike Cardwell <linux-cluster at lists.grepular.com> wrote:
> I just did a yum upgrade on my test cluster running on Centos 5.3, and  
> now when I start cman, the following happens:
>
> [root at web1 cluster]# service cman start
> Starting cluster:
>    Loading modules... done
>    Mounting configfs... done
>    Starting ccsd... done
>    Starting cman... failed
> /usr/sbin/cman_tool: aisexec daemon didn't start

Take a look at this, please:

https://www.redhat.com/archives/linux-cluster/2009-September/msg00115.html

It looks like CentOS did not upgrade the openais package, which is part
of RHEL 5.4, yet.

Either downgrade CMAN or fetch openais from rawhide and compile the
recent openais package yourself.

Regards,
Volker



From volker at ixolution.de  Sun Sep 20 14:44:38 2009
From: volker at ixolution.de (Volker Dormeyer)
Date: Sun, 20 Sep 2009 16:44:38 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
Message-ID: <20090920144438.GC4922@dijkstra>

On Fri, Sep 18, 2009 at 05:08:57PM -0500,
Paras pradhan <pradhanparas at gmail.com> wrote:
> I am using cluster suite for HA of xen virtual machines. Now I am
> having another problem. When I start the my xen vm in one node, it
> also starts on other nodes. Which daemon controls  this?

This is usually done bei clurgmgrd (which is part of the rgmanager
package). To me, this sounds like a configuration problem. Maybe,
you can post your cluster.conf?

Regards,
Volker



From linux-cluster at lists.grepular.com  Sun Sep 20 14:54:24 2009
From: linux-cluster at lists.grepular.com (Mike Cardwell)
Date: Sun, 20 Sep 2009 15:54:24 +0100
Subject: [Linux-cluster] cman fails after a yum upgrade
In-Reply-To: <20090920144052.GB4922@dijkstra>
References: <4AB6373B.5090008@lists.grepular.com>
	<20090920144052.GB4922@dijkstra>
Message-ID: <4AB64220.9090900@lists.grepular.com>

Volker Dormeyer wrote:

>> I just did a yum upgrade on my test cluster running on Centos 5.3, and  
>> now when I start cman, the following happens:
>>
>> [root at web1 cluster]# service cman start
>> Starting cluster:
>>    Loading modules... done
>>    Mounting configfs... done
>>    Starting ccsd... done
>>    Starting cman... failed
>> /usr/sbin/cman_tool: aisexec daemon didn't start
> 
> Take a look at this, please:
> 
> https://www.redhat.com/archives/linux-cluster/2009-September/msg00115.html
> 
> It looks like CentOS did not upgrade the openais package, which is part
> of RHEL 5.4, yet.
> 
> Either downgrade CMAN or fetch openais from rawhide and compile the
> recent openais package yourself.

Thanks, I completely missed that thread.

-- 
Mike Cardwell - IT Consultant and LAMP developer
Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/



From ngabor at brc.hu  Mon Sep 21 08:19:23 2009
From: ngabor at brc.hu (Gabor Nemeth)
Date: Mon, 21 Sep 2009 10:19:23 +0200
Subject: [Linux-cluster] cman fencing problem
Message-ID: <20090921101923.11456lkr11lrz5sb@webmail.brc>

Hi!

I have a problem with cman:
Waiting for fenced to join the fence group.

dist. Debian Lenny,
kernel 2.6.26-2-xen-amd64
cman 2.03.09 (from debian packages)

cluster.conf
<?xml version="1.0"?>
<cluster name="xen-netraid1" config_version="1">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
   <clusternodes>
     <clusternode name="xen1" votes="1" nodeid="1">
       <fence>
         <method name="human">
           <device name="human" ipaddr="10.1.0.x"/>
         </method>
       </fence>
     </clusternode>
     <clusternode name="xen2" votes="1" nodeid="2">.
       <fence>
         <method name="human">
           <device name="human" ipaddr="10.1.0.y"/>
         </method>
       </fence>
     </clusternode>
   </clusternodes>
   <cman expected_votes="1" two_node="1"/>
   <fencedevices>
     <fencedevice name="human" agent="fence_manual"/>
   </fencedevices>
</cluster>

hosts
127.0.0.1       localhost
10.1.0.x        xen1
10.1.0.y        xen2

I have five vlans and xen bridge,
10.1.0.x is not vlan, its eth0 all in nodes!

I reading the manual and forums many times,

I probed this:
- starting the ccsd two  nodes with -4 options (only ipv4)
- rebooting two nodes

all the some DOES NOT WORK!
I reading the two nodes:
Waiting for fenced to join the fence group.

What could be wrong?

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.




From R.Schwierz at physik.tu-dresden.de  Mon Sep 21 12:35:56 2009
From: R.Schwierz at physik.tu-dresden.de (Rainer Schwierz)
Date: Mon, 21 Sep 2009 14:35:56 +0200
Subject: [Linux-cluster] RHCS TestCluster with ScientificLinux 5.2
In-Reply-To: <1253298359.12193.8.camel@localhost.localdomain>
References: <4A8C1BEF.3010200@physik.tu-dresden.de>	<4AB1C961.4020908@physik.tu-dresden.de>
	<1253298359.12193.8.camel@localhost.localdomain>
Message-ID: <4AB7732C.4090004@physik.tu-dresden.de>

Hello Lon,

could you please go a little bit more in detail?.
Each NFS filesystem has been exported only once, that means e.g. when I 
have activated service_nfs_home, service_nfs_home_fast was not active.
In all the tests IPtables on both servers has been stopped and all TCP 
and UDP traffic on the clients to and from the server's real IP and the 
servers service IPs have been accepted. I also see no significant 
difference to the document
"The Red Hat Cluster Suite NFS Cookbook" by Bob Peterson.

Thanks in advance & Cheers, Rainer

Lon Hohberger wrote:
> On Thu, 2009-09-17 at 07:30 +0200, Rainer Schwierz wrote:
>> Hello,
>>
>> hmm, meanwhile the fence_apc problem is fixed by a more recent version 
>> of fence_apc.
>>
>> But the nfs lock problem is still open. Does it mean I definitely 
>> should not use ScientificLinux and switch to Fedora 11 or RHEL5.4?
> 
> When doing a multi-export of the same NFS file system on top of GFS,
> lock recovery will not work correctly - there's no way to prevent a new
> GFS lock from being taken after a failure but before NFS has sent the
> lock reclaim notifications, nor is there a way for GFS to respect the
> NFS lock reclaim grace period.
> 
> I do not know why you would have this particular problem, though - locks
> shouldn't randomly "not work at all" just because you take them from a
> service IP address vs. the host's real IP.  Maybe there's some IPtables
> firewall rule in place ?
> 
> 
> -- Lon
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
| R.Schwierz at physik.tu-dresden.de                     |
| Rainer  Schwierz, Inst. f. Kern- und Teilchenphysik |
| TU Dresden,       D-01062 Dresden                   |
| Tel. ++49 351 463 32957    FAX ++49 351 463 37292   |
| http://iktp.tu-dresden.de/~schwierz/                |



From T.Kumar at alcoa.com  Mon Sep 21 12:39:46 2009
From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS))
Date: Mon, 21 Sep 2009 08:39:46 -0400
Subject: [Linux-cluster] consistent quorum disk on cluster nodes 
In-Reply-To: <20090918103540.B8E318E016A@hormel.redhat.com>
References: <20090918103540.B8E318E016A@hormel.redhat.com>
Message-ID: <0C3FC6B507AF684199E57BFCA3EAB553172445E6@NOANDC-MXU11.NOA.Alcoa.com>

Cluster quorum work the label and it is suppose to be unique across a
given SAN. AFAIK, not mandatory to keep the same device names

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of
linux-cluster-request at redhat.com
Sent: Friday, September 18, 2009 6:36 AM
To: linux-cluster at redhat.com
Subject: Linux-cluster Digest, Vol 65, Issue 21

Send Linux-cluster mailing list submissions to
	linux-cluster at redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request at redhat.com

You can reach the person managing the list at
	linux-cluster-owner at redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. consistent quorum disk on cluster nodes (James Marcinek)
   2. Re: consistent quorum disk on cluster nodes
      (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
   3. Re: consistent quorum disk on cluster nodes (James Marcinek)
   4. Re: consistent quorum disk on cluster nodes
      (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
   5. Re: consistent quorum disk on cluster nodes (James Marcinek)
   6. exact iptables command to stop a source from	accessing a
      Linux cluster (sunhux G)
   7. Re: exact iptables command to stop a source from 	accessing a
      Linux cluster (Ian Hayes)
   8. Re: exact iptables command to stop a source from 	accessing a
      Linux cluster (sunhux G)
   9. Re: exact iptables command to stop a source from 	accessing a
      Linux cluster (Ian Hayes)
  10. Re: exact iptables command to stop a source from 	accessing a
      Linux cluster (sunhux G)


----------------------------------------------------------------------

Message: 1
Date: Thu, 17 Sep 2009 12:30:30 -0400 (EDT)
From: James Marcinek <jmarc1 at jemconsult.biz>
Subject: [Linux-cluster] consistent quorum disk on cluster nodes
To: rhelcluster <Linux-cluster at redhat.com>
Message-ID:
	<664969789.1531253205029999.JavaMail.root at srv01.jemconsult.biz>
Content-Type: text/plain; charset=utf-8

Hello all,

Can anyone point me in the right direction to some examples of defining
udev rules for quorum disks (or disks in general). I'm reading in the
docs that the quorum disk needs to be the same on all nodes. I have 2
node clusters, which I've allocated a shared 100MB lun that I've created
the quorum disk (mkqdisk command); however the devices are not showing
up as the same device (eg /dev/sdc on one node and /dev/sdb on the other
). 

If the quorum disk could be managed by lvm that would likely make life
easier but I'm not sure that would work or is supported, but I think
that would be easier than having to create a udev rule for it...

Thanks,

james



------------------------------

Message: 2
Date: Thu, 17 Sep 2009 18:46:04 +0200
From: "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]"
	<mad at wol.de>
Subject: Re: [Linux-cluster] consistent quorum disk on cluster nodes
To: linux clustering <linux-cluster at redhat.com>
Message-ID: <1253205964.2641.156.camel at marc>
Content-Type: text/plain

Am Donnerstag, den 17.09.2009, 12:30 -0400 schrieb James Marcinek:
> Hello all,
> 
> Can anyone point me in the right direction to some examples of
defining udev rules for quorum disks (or disks in general). I'm reading
in the docs that the quorum disk needs to be the same on all nodes. I
have 2 node clusters, which I've allocated a shared 100MB lun that I've
created the quorum disk (mkqdisk command); however the devices are not
showing up as the same device (eg /dev/sdc on one node and /dev/sdb on
the other ). 
> 
> If the quorum disk could be managed by lvm that would likely make life
easier but I'm not sure that would work or is supported, but I think
that would be easier than having to create a udev rule for it...
> 
> Thanks,
> 
> james

You should try to use the LABEL of the qdisk to select it.

man qdiskd

Marc



------------------------------

Message: 3
Date: Thu, 17 Sep 2009 12:53:47 -0400 (EDT)
From: James Marcinek <jmarc1 at jemconsult.biz>
Subject: Re: [Linux-cluster] consistent quorum disk on cluster nodes
To: linux clustering <linux-cluster at redhat.com>
Message-ID:
	<1664263533.1561253206427426.JavaMail.root at srv01.jemconsult.biz>
Content-Type: text/plain; charset=utf-8

ok,

So I can specify it by the label when I issued the mkqdisk -c /dev/sdx
-l quorum command (where x is device)?

Thanks,

James
----- Original Message -----
From: "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]"
<mad at wol.de>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 17, 2009 12:46:04 PM GMT -05:00 US/Canada
Eastern
Subject: Re: [Linux-cluster] consistent quorum disk on cluster nodes

Am Donnerstag, den 17.09.2009, 12:30 -0400 schrieb James Marcinek:
> Hello all,
> 
> Can anyone point me in the right direction to some examples of
defining udev rules for quorum disks (or disks in general). I'm reading
in the docs that the quorum disk needs to be the same on all nodes. I
have 2 node clusters, which I've allocated a shared 100MB lun that I've
created the quorum disk (mkqdisk command); however the devices are not
showing up as the same device (eg /dev/sdc on one node and /dev/sdb on
the other ). 
> 
> If the quorum disk could be managed by lvm that would likely make life
easier but I'm not sure that would work or is supported, but I think
that would be easier than having to create a udev rule for it...
> 
> Thanks,
> 
> james

You should try to use the LABEL of the qdisk to select it.

man qdiskd

Marc

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



------------------------------

Message: 4
Date: Thu, 17 Sep 2009 19:00:41 +0200
From: "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]"
	<mad at wol.de>
Subject: Re: [Linux-cluster] consistent quorum disk on cluster nodes
To: linux clustering <linux-cluster at redhat.com>
Message-ID: <1253206841.2641.162.camel at marc>
Content-Type: text/plain

Am Donnerstag, den 17.09.2009, 12:53 -0400 schrieb James Marcinek:
> ok,
> 
> So I can specify it by the label when I issued the mkqdisk -c /dev/sdx
-l quorum command (where x is device)?
> 
> Thanks,

James,

you need to run the mkqdisk command only on one of your boxes but add
the desired label to it.
Then read how to add the qdisk configuration to your cluster.conf in the
man page of qdisk(5).


Marc



------------------------------

Message: 5
Date: Thu, 17 Sep 2009 18:49:03 -0400 (EDT)
From: James Marcinek <jmarc1 at jemconsult.biz>
Subject: Re: [Linux-cluster] consistent quorum disk on cluster nodes
To: linux clustering <linux-cluster at redhat.com>
Message-ID:
	<542588928.1661253227743959.JavaMail.root at srv01.jemconsult.biz>
Content-Type: text/plain; charset=utf-8

Yes thanks,

I was just put the label in when I defined the quorum disk. All 9
clusters up and running!

Thanks,

James
----- Original Message -----
From: "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]"
<mad at wol.de>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, September 17, 2009 1:00:41 PM GMT -05:00 US/Canada
Eastern
Subject: Re: [Linux-cluster] consistent quorum disk on cluster nodes

Am Donnerstag, den 17.09.2009, 12:53 -0400 schrieb James Marcinek:
> ok,
> 
> So I can specify it by the label when I issued the mkqdisk -c /dev/sdx
-l quorum command (where x is device)?
> 
> Thanks,

James,

you need to run the mkqdisk command only on one of your boxes but add
the desired label to it.
Then read how to add the qdisk configuration to your cluster.conf in the
man page of qdisk(5).


Marc

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



------------------------------

Message: 6
Date: Fri, 18 Sep 2009 10:33:04 +0800
From: sunhux G <sunhux at gmail.com>
Subject: [Linux-cluster] exact iptables command to stop a source from
	accessing a Linux cluster
To: linux clustering <linux-cluster at redhat.com>
Message-ID:
	<60f08e700909171933h116656ablf58ea7212026472c at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

 Hi,

I have a RHEL 5.1  cluster that's constantly being accessed by an
application from a Windows server application via sqlnet (ie Tcp
port 1521) which caused a specific Oracle accounts to be locked.

The owner of the Windows box does not know why the Filenet
application is doing this so while she's doing the research which
configuration in Filenet needs to be fixed to stop this, we need an
interim measure to block this Windows server's access to the cluster.

Thus I would like to set up iptables / firewall on this Linux box to
stop the sqlnet access.  Can someone provide me some example
commands / syntax ?

Source IP address : 10.5.5.25   (Windows server)
Tcp port : 1521
My Linux boxes IP address :  10.5.5.46 / .47
My Linux cluster virtual addr : 10.5.5.45

In fact I would like to block on all ports on the Linux cluster to stop
this Windows server from accessing it.  So what's the exact commands
I should issue on each of the Linux box?  Would iptables also block
the Windows server from accessing the cluster virtual IP addr?


Thanks
U
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20090918/a6525
9ca/attachment.html

------------------------------

Message: 7
Date: Thu, 17 Sep 2009 19:36:42 -0700
From: Ian Hayes <cthulhucalling at gmail.com>
Subject: Re: [Linux-cluster] exact iptables command to stop a source
	from 	accessing a Linux cluster
To: linux clustering <linux-cluster at redhat.com>
Message-ID:
	<36df569a0909171936n3dec0996uf96342be6a1f672 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

iptables -A INPUT -s 10.5.5.25 -j DROP

On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:

>
>  Hi,
>
> I have a RHEL 5.1  cluster that's constantly being accessed by an
> application from a Windows server application via sqlnet (ie Tcp
> port 1521) which caused a specific Oracle accounts to be locked.
>
> The owner of the Windows box does not know why the Filenet
> application is doing this so while she's doing the research which
> configuration in Filenet needs to be fixed to stop this, we need an
> interim measure to block this Windows server's access to the cluster.
>
> Thus I would like to set up iptables / firewall on this Linux box to
> stop the sqlnet access.  Can someone provide me some example
> commands / syntax ?
>
> Source IP address : 10.5.5.25   (Windows server)
> Tcp port : 1521
> My Linux boxes IP address :  10.5.5.46 / .47
> My Linux cluster virtual addr : 10.5.5.45
>
> In fact I would like to block on all ports on the Linux cluster to
stop
> this Windows server from accessing it.  So what's the exact commands
> I should issue on each of the Linux box?  Would iptables also block
> the Windows server from accessing the cluster virtual IP addr?
>
>
> Thanks
> U
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20090917/6701d
a76/attachment.html

------------------------------

Message: 8
Date: Fri, 18 Sep 2009 11:22:15 +0800
From: sunhux G <sunhux at gmail.com>
Subject: Re: [Linux-cluster] exact iptables command to stop a source
	from 	accessing a Linux cluster
To: linux clustering <linux-cluster at redhat.com>
Message-ID:
	<60f08e700909172022k73729a81s38d10eb9024f358c at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Thanks Ian.

So I issue this command on both cluster nodes and it will also
stop access to the virtual cluster address?

What's the command to reverse / remove
" iptables -A INPUT -s 10.5.5.25 -j DROP " ?
Just in case there's a problem, I'll need to reverse.

Tks
U
On Fri, Sep 18, 2009 at 10:36 AM, Ian Hayes
<cthulhucalling at gmail.com>wrote:

> iptables -A INPUT -s 10.5.5.25 -j DROP
>
>   On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:
>
>>
>>  Hi,
>>
>> I have a RHEL 5.1  cluster that's constantly being accessed by an
>> application from a Windows server application via sqlnet (ie Tcp
>> port 1521) which caused a specific Oracle accounts to be locked.
>>
>> The owner of the Windows box does not know why the Filenet
>> application is doing this so while she's doing the research which
>> configuration in Filenet needs to be fixed to stop this, we need an
>> interim measure to block this Windows server's access to the cluster.
>>
>> Thus I would like to set up iptables / firewall on this Linux box to
>> stop the sqlnet access.  Can someone provide me some example
>> commands / syntax ?
>>
>> Source IP address : 10.5.5.25   (Windows server)
>> Tcp port : 1521
>> My Linux boxes IP address :  10.5.5.46 / .47
>> My Linux cluster virtual addr : 10.5.5.45
>>
>> In fact I would like to block on all ports on the Linux cluster to
stop
>> this Windows server from accessing it.  So what's the exact commands
>> I should issue on each of the Linux box?  Would iptables also block
>> the Windows server from accessing the cluster virtual IP addr?
>>
>>
>> Thanks
>> U
>>
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20090918/87275
760/attachment.html

------------------------------

Message: 9
Date: Thu, 17 Sep 2009 20:38:25 -0700
From: Ian Hayes <cthulhucalling at gmail.com>
Subject: Re: [Linux-cluster] exact iptables command to stop a source
	from 	accessing a Linux cluster
To: linux clustering <linux-cluster at redhat.com>
Message-ID:
	<36df569a0909172038t18280965y10efb75d9a802acb at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

[root at cthulhu ~]# iptables -L --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination
1    DROP       all  --  10.5.5.5             anywhere
2    DROP       all  --  10.5.5.6             anywhere
3    DROP       all  --  10.5.5.7             anywhere

Find the rule number that matches the one you want to delete. Say you
want
to delete #2 from the INPUT table

[root at cthulhu ~]# iptables -D INPUT 2
[root at cthulhu ~]# iptables -L --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination
1    DROP       all  --  10.5.5.5             anywhere
2    DROP       all  --  10.5.5.7             anywhere


Or you can do iptables -F which will basically drop all your iptables.
Make
sure you've saved recently before you do that.

On Thu, Sep 17, 2009 at 8:22 PM, sunhux G <sunhux at gmail.com> wrote:

> Thanks Ian.
>
> So I issue this command on both cluster nodes and it will also
> stop access to the virtual cluster address?
>
> What's the command to reverse / remove
> " iptables -A INPUT -s 10.5.5.25 -j DROP " ?
> Just in case there's a problem, I'll need to reverse.
>
> Tks
> U
> On Fri, Sep 18, 2009 at 10:36 AM, Ian Hayes
<cthulhucalling at gmail.com>wrote:
>
>> iptables -A INPUT -s 10.5.5.25 -j DROP
>>
>>   On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com> wrote:
>>
>>>
>>>  Hi,
>>>
>>> I have a RHEL 5.1  cluster that's constantly being accessed by an
>>> application from a Windows server application via sqlnet (ie Tcp
>>> port 1521) which caused a specific Oracle accounts to be locked.
>>>
>>> The owner of the Windows box does not know why the Filenet
>>> application is doing this so while she's doing the research which
>>> configuration in Filenet needs to be fixed to stop this, we need an
>>> interim measure to block this Windows server's access to the
cluster.
>>>
>>> Thus I would like to set up iptables / firewall on this Linux box to
>>> stop the sqlnet access.  Can someone provide me some example
>>> commands / syntax ?
>>>
>>> Source IP address : 10.5.5.25   (Windows server)
>>> Tcp port : 1521
>>> My Linux boxes IP address :  10.5.5.46 / .47
>>> My Linux cluster virtual addr : 10.5.5.45
>>>
>>> In fact I would like to block on all ports on the Linux cluster to
stop
>>> this Windows server from accessing it.  So what's the exact commands
>>> I should issue on each of the Linux box?  Would iptables also block
>>> the Windows server from accessing the cluster virtual IP addr?
>>>
>>>
>>> Thanks
>>> U
>>>
>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20090917/ebd68
3b9/attachment.html

------------------------------

Message: 10
Date: Fri, 18 Sep 2009 18:35:23 +0800
From: sunhux G <sunhux at gmail.com>
Subject: Re: [Linux-cluster] exact iptables command to stop a source
	from 	accessing a Linux cluster
To: linux clustering <linux-cluster at redhat.com>
Message-ID:
	<60f08e700909180335n767535bfjdd39bc43ccd96122 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

 I can't even start up iptables as the previous admin hardened it
(but not sure how / where he hardened it)

So despite that I do
service iptables start,
"service iptables status" still show "Firewall is stopped"

Now, can I use /etc/hosts.deny instead ?
Do I need to do "pkill -HUP tcpd"   or
"service xinetd restart"   - which of the two
commands shd I execute & what's the syntax
in /etc/hosts.deny ?

Thanks

On Fri, Sep 18, 2009 at 11:38 AM, Ian Hayes
<cthulhucalling at gmail.com>wrote:

> [root at cthulhu ~]# iptables -L --line-numbers
> Chain INPUT (policy ACCEPT)
> num  target     prot opt source               destination
> 1    DROP       all  --  10.5.5.5             anywhere
> 2    DROP       all  --  10.5.5.6             anywhere
> 3    DROP       all  --  10.5.5.7             anywhere
>
> Find the rule number that matches the one you want to delete. Say you
want
> to delete #2 from the INPUT table
>
> [root at cthulhu ~]# iptables -D INPUT 2
> [root at cthulhu ~]# iptables -L --line-numbers
> Chain INPUT (policy ACCEPT)
> num  target     prot opt source               destination
> 1    DROP       all  --  10.5.5.5             anywhere
> 2    DROP       all  --  10.5.5.7             anywhere
>
>
> Or you can do iptables -F which will basically drop all your iptables.
Make
> sure you've saved recently before you do that.
>
>
> On Thu, Sep 17, 2009 at 8:22 PM, sunhux G <sunhux at gmail.com> wrote:
>
>> Thanks Ian.
>>
>> So I issue this command on both cluster nodes and it will also
>> stop access to the virtual cluster address?
>>
>> What's the command to reverse / remove
>> " iptables -A INPUT -s 10.5.5.25 -j DROP " ?
>> Just in case there's a problem, I'll need to reverse.
>>
>> Tks
>> U
>>   On Fri, Sep 18, 2009 at 10:36 AM, Ian Hayes
<cthulhucalling at gmail.com>wrote:
>>
>>> iptables -A INPUT -s 10.5.5.25 -j DROP
>>>
>>>   On Thu, Sep 17, 2009 at 7:33 PM, sunhux G <sunhux at gmail.com>
wrote:
>>>
>>>>
>>>>  Hi,
>>>>
>>>> I have a RHEL 5.1  cluster that's constantly being accessed by an
>>>> application from a Windows server application via sqlnet (ie Tcp
>>>> port 1521) which caused a specific Oracle accounts to be locked.
>>>>
>>>> The owner of the Windows box does not know why the Filenet
>>>> application is doing this so while she's doing the research which
>>>> configuration in Filenet needs to be fixed to stop this, we need an
>>>> interim measure to block this Windows server's access to the
cluster.
>>>>
>>>> Thus I would like to set up iptables / firewall on this Linux box
to
>>>> stop the sqlnet access.  Can someone provide me some example
>>>> commands / syntax ?
>>>>
>>>> Source IP address : 10.5.5.25   (Windows server)
>>>> Tcp port : 1521
>>>> My Linux boxes IP address :  10.5.5.46 / .47
>>>> My Linux cluster virtual addr : 10.5.5.45
>>>>
>>>> In fact I would like to block on all ports on the Linux cluster to
stop
>>>> this Windows server from accessing it.  So what's the exact
commands
>>>> I should issue on each of the Linux box?  Would iptables also block
>>>> the Windows server from accessing the cluster virtual IP addr?
>>>>
>>>>
>>>> Thanks
>>>> U
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.redhat.com/archives/linux-cluster/attachments/20090918/21bba
89a/attachment.html

------------------------------

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 65, Issue 21
*********************************************



From swhiteho at redhat.com  Mon Sep 21 12:53:48 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 21 Sep 2009 13:53:48 +0100
Subject: [Linux-cluster] GFS2 fatal: invalid metadata block
In-Reply-To: <4AB4BD8E.9060905@fiber.net>
References: <4AB4BD8E.9060905@fiber.net>
Message-ID: <1253537628.6052.274.camel@localhost.localdomain>

Hi,

On Sat, 2009-09-19 at 05:16 -0600, Kai Meyer wrote:
> I have a 5 node cluster running kernel 2.6.18-128.1.6.el5xen and 
> gfs2-utils-0.1.53-1.el5_3.3 . Twice in 10 days, each node in my cluster 
> has failed with the same message in /var/log/messages. dmesg reports the 
> same errors, and on some nodes there are no other entries previous to 
> the invalid metadata block error.
> 
> I would like to know what issues can trigger such an event. If it is 
> more helpful for me to provide more information, I will be happy to, I'm 
> just not sure what other information you would consider relevant.
> 
> Thank you for your time,
> -Kai Meyer
> 
It means that the kernel was looking for an indirect block, but instead
found something that was not an indirect block. The only way to fix this
is with fsck (after unmounting on all nodes) otherwise the issue is
likely to continue to occur each time you access the particular inode
with the problem.

There have been a couple of reports of this (or very similar) issues
recently. The problem in each case is that the original issue probably
happened some time before it triggered the message which you've seen.
That means that it is very tricky to figure out exactly what the cause
is.

I'd be very interested to know whether this filesystem was a newly
created gfs2 filesystem or an upgraded gfs1 filesystem. Also, whether
there have been any other issues, however minor, which might have caused
a node to be rebooted or fenced since the filesystem was created? Also,
any other background information about the type of workload that was
being run on the filesystem would be helpful too.

Steve.


> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1: fatal: invalid metadata block
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1:   bh = 567447963 (magic number)
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1:   function = 
> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line
> = 334
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1: about to withdraw this file system
> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1: telling LM to withdraw
> Sep 19 02:02:07 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.1: withdrawn
> Sep 19 02:02:07 192.168.100.104 kernel: 
> Sep 19 02:02:07 192.168.100.104 kernel: Call Trace:
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885154ce>] 
> :gfs2:gfs2_lm_withdraw+0xc1/0xd0
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262907>] 
> __wait_on_bit+0x60/0x6e
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80215788>] 
> sync_buffer+0x0/0x3f
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262981>] 
> out_of_line_wait_on_bit+0x6c/0x78
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8029a01a>] 
> wake_bit_function+0x0/0x23
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a7f1>] 
> submit_bh+0x10a/0x111
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885284a7>] 
> :gfs2:gfs2_meta_check_ii+0x2c/0x38
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88518d30>] 
> :gfs2:gfs2_meta_indirect_buffer+0x104/0x160
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88509fc3>] 
> :gfs2:gfs2_block_map+0x1dc/0x33e
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a821>] 
> poll_freewait+0x29/0x6a
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a199>] 
> :gfs2:gfs2_extent_map+0x74/0xac
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a2ce>] 
> :gfs2:gfs2_write_alloc_required+0xfd/0x122
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885128d5>] 
> :gfs2:gfs2_glock_nq+0x248/0x273
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851a27c>] 
> :gfs2:gfs2_write_begin+0x99/0x36a
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851bd1b>] 
> :gfs2:gfs2_file_buffered_write+0x14b/0x2e5
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8020d3a5>] 
> file_read_actor+0x0/0xfc
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c151>] 
> :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c2f4>] 
> :gfs2:gfs2_file_write_nolock+0xaa/0x10f
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022eca0>] 
> __wake_up+0x38/0x4f
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80299fec>] 
> autoremove_wake_function+0x0/0x2e
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022fbe4>] 
> pipe_readv+0x38e/0x3a2
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80263bce>] 
> lock_kernel+0x1b/0x32
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c444>] 
> :gfs2:gfs2_file_write+0x49/0xa7
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80216da9>] 
> vfs_write+0xce/0x174
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff802175e1>] 
> sys_write+0x45/0x6e
> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8025f2f9>] 
> tracesys+0xab/0xb6
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From pradhanparas at gmail.com  Mon Sep 21 14:55:37 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Mon, 21 Sep 2009 09:55:37 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <20090920144438.GC4922@dijkstra>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
Message-ID: <8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>

Ok.. here is my cluster.conf file

--
[root at cvtst1 cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster alias="test" config_version="9" name="test">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="cvtst2" nodeid="1" votes="1">
			<fence/>
		</clusternode>
		<clusternode name="cvtst1" nodeid="2" votes="1">
			<fence/>
		</clusternode>
		<clusternode name="cvtst3" nodeid="3" votes="1">
			<fence/>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices/>
	<rm>
		<failoverdomains>
			<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
				<failoverdomainnode name="cvtst2" priority="3"/>
				<failoverdomainnode name="cvtst1" priority="1"/>
				<failoverdomainnode name="cvtst3" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources/>
		<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
name="guest1" path="/vms" recovery="r
estart" restart_expire_time="0"/>
	</rm>
</cluster>
[root at cvtst1 cluster]#
------

Thanks!
Paras.


On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
> Paras pradhan <pradhanparas at gmail.com> wrote:
>> I am using cluster suite for HA of xen virtual machines. Now I am
>> having another problem. When I start the my xen vm in one node, it
>> also starts on other nodes. Which daemon controls ?this?
>
> This is usually done bei clurgmgrd (which is part of the rgmanager
> package). To me, this sounds like a configuration problem. Maybe,
> you can post your cluster.conf?
>
> Regards,
> Volker
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From R.Schwierz at physik.tu-dresden.de  Mon Sep 21 15:04:25 2009
From: R.Schwierz at physik.tu-dresden.de (Rainer Schwierz)
Date: Mon, 21 Sep 2009 17:04:25 +0200
Subject: [Linux-cluster] RHCS TestCluster with ScientificLinux 5.2
In-Reply-To: <4AB7732C.4090004@physik.tu-dresden.de>
References: <4A8C1BEF.3010200@physik.tu-dresden.de>	<4AB1C961.4020908@physik.tu-dresden.de>	<1253298359.12193.8.camel@localhost.localdomain>
	<4AB7732C.4090004@physik.tu-dresden.de>
Message-ID: <4AB795F9.2050107@physik.tu-dresden.de>

Hello Lon,

only to explain, if I mount the clients with "nolock" the error messages 
of course go away, but I do not see a real reason to mount nfs 
filesystems with nolock, because the clients and servers kernels are all 
  2.6.18-128.1.10.el5 ...

Cheers, Rainer


Rainer Schwierz wrote:
> Hello Lon,
> 
> could you please go a little bit more in detail?.
> Each NFS filesystem has been exported only once, that means e.g. when I 
> have activated service_nfs_home, service_nfs_home_fast was not active.
> In all the tests IPtables on both servers has been stopped and all TCP 
> and UDP traffic on the clients to and from the server's real IP and the 
> servers service IPs have been accepted. I also see no significant 
> difference to the document
> "The Red Hat Cluster Suite NFS Cookbook" by Bob Peterson.
> 
> Thanks in advance & Cheers, Rainer
> 
> Lon Hohberger wrote:
>> On Thu, 2009-09-17 at 07:30 +0200, Rainer Schwierz wrote:
>>> Hello,
>>>
>>> hmm, meanwhile the fence_apc problem is fixed by a more recent 
>>> version of fence_apc.
>>>
>>> But the nfs lock problem is still open. Does it mean I definitely 
>>> should not use ScientificLinux and switch to Fedora 11 or RHEL5.4?
>>
>> When doing a multi-export of the same NFS file system on top of GFS,
>> lock recovery will not work correctly - there's no way to prevent a new
>> GFS lock from being taken after a failure but before NFS has sent the
>> lock reclaim notifications, nor is there a way for GFS to respect the
>> NFS lock reclaim grace period.
>>
>> I do not know why you would have this particular problem, though - locks
>> shouldn't randomly "not work at all" just because you take them from a
>> service IP address vs. the host's real IP.  Maybe there's some IPtables
>> firewall rule in place ?
>>
>>
>> -- Lon
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 


-- 
| R.Schwierz at physik.tu-dresden.de                     |
| Rainer  Schwierz, Inst. f. Kern- und Teilchenphysik |
| TU Dresden,       D-01062 Dresden                   |
| Tel. ++49 351 463 32957    FAX ++49 351 463 37292   |
| http://iktp.tu-dresden.de/~schwierz/                |



From lhh at redhat.com  Mon Sep 21 15:13:15 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 21 Sep 2009 11:13:15 -0400
Subject: [Linux-cluster] cman fencing problem
In-Reply-To: <20090921101923.11456lkr11lrz5sb@webmail.brc>
References: <20090921101923.11456lkr11lrz5sb@webmail.brc>
Message-ID: <1253545995.15779.9.camel@localhost.localdomain>

On Mon, 2009-09-21 at 10:19 +0200, Gabor Nemeth wrote:
> Hi!
> 
> I have a problem with cman:
> Waiting for fenced to join the fence group.
> 
> dist. Debian Lenny,
> kernel 2.6.26-2-xen-amd64
> cman 2.03.09 (from debian packages)
> 
> cluster.conf
> <?xml version="1.0"?>
> <cluster name="xen-netraid1" config_version="1">
> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>    <clusternodes>
>      <clusternode name="xen1" votes="1" nodeid="1">
>        <fence>
>          <method name="human">
>            <device name="human" ipaddr="10.1.0.x"/>
>          </method>
>        </fence>
>      </clusternode>
>      <clusternode name="xen2" votes="1" nodeid="2">.
>        <fence>
>          <method name="human">
>            <device name="human" ipaddr="10.1.0.y"/>
>          </method>
>        </fence>
>      </clusternode>
>    </clusternodes>
>    <cman expected_votes="1" two_node="1"/>
>    <fencedevices>
>      <fencedevice name="human" agent="fence_manual"/>
>    </fencedevices>
> </cluster>
> 
> hosts
> 127.0.0.1       localhost
> 10.1.0.x        xen1
> 10.1.0.y        xen2
> 
> I have five vlans and xen bridge,
> 10.1.0.x is not vlan, its eth0 all in nodes!
> 
> I reading the manual and forums many times,
> 
> I probed this:
> - starting the ccsd two  nodes with -4 options (only ipv4)
> - rebooting two nodes
> 
> all the some DOES NOT WORK!
> I reading the two nodes:
> Waiting for fenced to join the fence group.

What does cman_tool services (or group_tool ls depending on what version
of cluster you are using) say?

-- Lon
 



From lhh at redhat.com  Mon Sep 21 15:16:14 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 21 Sep 2009 11:16:14 -0400
Subject: [Linux-cluster] where exactly cluster services are stoppped
	during shutdown?
In-Reply-To: <561c252c0909100429q7671ad0cj3880792a603a24a5@mail.gmail.com>
References: <561c252c0909100429q7671ad0cj3880792a603a24a5@mail.gmail.com>
Message-ID: <1253546174.15779.11.camel@localhost.localdomain>

On Thu, 2009-09-10 at 13:29 +0200, Gianluca Cecchi wrote:
> Hello,
> suppose that I have a service srvname defined in chkconfig and I would
> like to insert it as a resource/service in my cluster.conf
> (version 3 of cluster as found in f11, but thanks for answer for
> version 2 as in rhel 5 if different)
> So my cluster.conf is something like this:
>                 <resources>
>                         <script file="/etc/init.d/srvname"
> name="SRV1"/>
>                 </resources>
>                 <service domain="SRV1" autostart="1" name="SRV1">
>                 <script ref="SRV1"/>
>                 </service>
> To have the service to be managed only by cluster I have to do:
> 
> chkconfig --del srvname
> 
> SO now the question is: to understand correctly how to manage eventual
> interactions with other init scripts, where and how exactly the
> service srvname will be stopped when I run
> 
> shutdown -h now 
> or
> shutdown -r now

rgmanager is among the first things to stop if chkconfig'd on

> Which one of the init script related is responsible to do a "srvname
> stop" and when?

/etc/init.d/rgmanager

> I presume rgmanager but I would like confirmation.

Confirmed.

> Is it correct to leave "as is" the init script in general if it is a
> standard provided one or do I have to change it to be correctly
> managed as a cluster script?

It should be fine to leave it as an initscript.

-- Lon



From ngabor at brc.hu  Tue Sep 22 06:04:52 2009
From: ngabor at brc.hu (Gabor Nemeth)
Date: Tue, 22 Sep 2009 08:04:52 +0200
Subject: [Linux-cluster] cman fencing problem
In-Reply-To: <1253545995.15779.9.camel@localhost.localdomain>
References: <20090921101923.11456lkr11lrz5sb@webmail.brc>
	<1253545995.15779.9.camel@localhost.localdomain>
Message-ID: <20090922080452.130517f1p40zk4xw@webmail.brc>

Quoting "Lon Hohberger" <lhh at redhat.com>:

> On Mon, 2009-09-21 at 10:19 +0200, Gabor Nemeth wrote:
>> Hi!
>>
>> I have a problem with cman:
>> Waiting for fenced to join the fence group.
>>
>> dist. Debian Lenny,
>> kernel 2.6.26-2-xen-amd64
>> cman 2.03.09 (from debian packages)
>>
>> cluster.conf
>> <?xml version="1.0"?>
>> <cluster name="xen-netraid1" config_version="1">
>> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>    <clusternodes>
>>      <clusternode name="xen1" votes="1" nodeid="1">
>>        <fence>
>>          <method name="human">
>>            <device name="human" ipaddr="10.1.0.x"/>
>>          </method>
>>        </fence>
>>      </clusternode>
>>      <clusternode name="xen2" votes="1" nodeid="2">.
>>        <fence>
>>          <method name="human">
>>            <device name="human" ipaddr="10.1.0.y"/>
>>          </method>
>>        </fence>
>>      </clusternode>
>>    </clusternodes>
>>    <cman expected_votes="1" two_node="1"/>
>>    <fencedevices>
>>      <fencedevice name="human" agent="fence_manual"/>
>>    </fencedevices>
>> </cluster>
>>
>> hosts
>> 127.0.0.1       localhost
>> 10.1.0.x        xen1
>> 10.1.0.y        xen2
>>
>> I have five vlans and xen bridge,
>> 10.1.0.x is not vlan, its eth0 all in nodes!
>>
>> I reading the manual and forums many times,
>>
>> I probed this:
>> - starting the ccsd two  nodes with -4 options (only ipv4)
>> - rebooting two nodes
>>
>> all the some DOES NOT WORK!
>> I reading the two nodes:
>> Waiting for fenced to join the fence group.
>
> What does cman_tool services (or group_tool ls depending on what version
> of cluster you are using) say?
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

# cman_tool -V
cman_tool 2.03.09 (built Nov  3 2008 18:55:14)
Copyright (C) Red Hat, Inc.  2004-2008  All rights reserved.

# group_tool -V
group_tool 2.03.09 (built Nov  3 2008 18:55:17)

.....

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.




From Alain.Moulle at bull.net  Tue Sep 22 06:59:32 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Tue, 22 Sep 2009 08:59:32 +0200
Subject: [Linux-cluster] Pb with dynamic update of ccsd/cluster.conf
Message-ID: <4AB875D4.6000900@bull.net>

Hi,
I'm working with : cman-3.0.0-15.rc1.fc11.x86_64 // 
rgmanager-3.0.0-15.rc1.fc11.x86_64
I wanted that CS  takes in account dynamically the changes in 
cluster.conf. Someone told
me here a few  weeks ago that ccs_sync was automatically done , and that 
I had only
to execute :
  cman_tool version -r <new_version>
but in this case, it is definetely stalled with corosync at top 100% cpu 
(and a clustat does
not work anymore).

Did I misunderstood and did I miss another command to execute before the 
cman_tool ?
Or is there a problem ?

Thanks for help
Alain



From ccaulfie at redhat.com  Tue Sep 22 07:16:17 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 22 Sep 2009 08:16:17 +0100
Subject: [Linux-cluster] Pb with dynamic update of ccsd/cluster.conf
In-Reply-To: <4AB875D4.6000900@bull.net>
References: <4AB875D4.6000900@bull.net>
Message-ID: <4AB879C1.7080202@redhat.com>

On 22/09/09 07:59, Alain.Moulle wrote:
> Hi,
> I'm working with : cman-3.0.0-15.rc1.fc11.x86_64 //
> rgmanager-3.0.0-15.rc1.fc11.x86_64
> I wanted that CS takes in account dynamically the changes in
> cluster.conf. Someone told
> me here a few weeks ago that ccs_sync was automatically done , and that
> I had only
> to execute :
> cman_tool version -r <new_version>
> but in this case, it is definetely stalled with corosync at top 100% cpu
> (and a clustat does
> not work anymore).
>
> Did I misunderstood and did I miss another command to execute before the
> cman_tool ?
> Or is there a problem ?

ccs_sync is not alled automatically from anywhere at the moment. it's on 
the list of things to implement but it hasn't happened yet. There is 
also a bug in some version of corosync (sorry, I can't be more specific 
here) that causes it to loop like that if you issue a cman_tool version 
command and the local cluster.conf does not match the version expected.

The correct update procedure for cluster3 is to manually use ccs_sync to 
copy cluster.conf around the cluster, then to issue cman_tool version 
-r0 to tell cman that the new version is available.

Chrissie



From edgar at edgar-matzinger.nl  Tue Sep 22 08:25:16 2009
From: edgar at edgar-matzinger.nl (Edgar Matzinger)
Date: Tue, 22 Sep 2009 10:25:16 +0200
Subject: [Linux-cluster] GFS without CS?
Message-ID: <1253607916.9339.0@edgar-matzingers-powerbook-g4-15.local>

Hello list,

  this is probably asked before: Can I use GFS without cluster suite?

Thanks, cu l8r, Edgar.
-- 
                                                        ''~``
                                                       ( o o )
+-------------------------------------------------.oooO--(_)--Oooo.---+




From raju.rajsand at gmail.com  Tue Sep 22 08:30:46 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Tue, 22 Sep 2009 14:00:46 +0530
Subject: [Linux-cluster] GFS without CS?
In-Reply-To: <1253607916.9339.0@edgar-matzingers-powerbook-g4-15.local>
References: <1253607916.9339.0@edgar-matzingers-powerbook-g4-15.local>
Message-ID: <8786b91c0909220130v777ac191q3893f2d3024ed3a4@mail.gmail.com>

Greetings,

On Tue, Sep 22, 2009 at 1:55 PM, Edgar Matzinger
<edgar at edgar-matzinger.nl> wrote:
>
> ?this is probably asked before: Can I use GFS without cluster suite?

Yes

Regards,

Rajagopal



From swhiteho at redhat.com  Tue Sep 22 09:03:26 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 22 Sep 2009 10:03:26 +0100
Subject: [Linux-cluster] GFS without CS?
In-Reply-To: <8786b91c0909220130v777ac191q3893f2d3024ed3a4@mail.gmail.com>
References: <1253607916.9339.0@edgar-matzingers-powerbook-g4-15.local>
	<8786b91c0909220130v777ac191q3893f2d3024ed3a4@mail.gmail.com>
Message-ID: <1253610206.2804.3.camel@localhost.localdomain>

Hi,

On Tue, 2009-09-22 at 14:00 +0530, Rajagopal Swaminathan wrote:
> Greetings,
> 
> On Tue, Sep 22, 2009 at 1:55 PM, Edgar Matzinger
> <edgar at edgar-matzinger.nl> wrote:
> >
> >  this is probably asked before: Can I use GFS without cluster suite?
> 
> Yes
> 
Provided you only use single node filesystems,

Steve.




From edgar at edgar-matzinger.nl  Tue Sep 22 10:51:29 2009
From: edgar at edgar-matzinger.nl (Edgar Matzinger)
Date: Tue, 22 Sep 2009 12:51:29 +0200
Subject: [Linux-cluster] GFS without CS?
In-Reply-To: <1253610206.2804.3.camel@localhost.localdomain> (from
	swhiteho@redhat.com on Tue Sep 22 11:03:26 2009)
References: <1253607916.9339.0@edgar-matzingers-powerbook-g4-15.local>
	<8786b91c0909220130v777ac191q3893f2d3024ed3a4@mail.gmail.com>
	<1253610206.2804.3.camel@localhost.localdomain>
Message-ID: <1253616689.9339.1@edgar-matzingers-powerbook-g4-15.local>

Hi Steven,

On 09/22/09 11:03:26, Steven Whitehouse wrote:
> Hi,
> 

> Provided you only use single node filesystems,
> 
> Steve.

what's the point in that? What I want to do is to create 10 GFS nodes
all using the same iSCSI volume. Why would I use CS? Oh, on each node
runs a POP3 service.

Thanks, cu l8r, Edgar.
-- 
                                                        ''~``
                                                       ( o o )
+-------------------------------------------------.oooO--(_)--Oooo.---+




From mad at wol.de  Tue Sep 22 11:23:47 2009
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Tue, 22 Sep 2009 13:23:47 +0200
Subject: [Linux-cluster] GFS without CS?
In-Reply-To: <1253616689.9339.1@edgar-matzingers-powerbook-g4-15.local>
References: <1253607916.9339.0@edgar-matzingers-powerbook-g4-15.local>
	<8786b91c0909220130v777ac191q3893f2d3024ed3a4@mail.gmail.com>
	<1253610206.2804.3.camel@localhost.localdomain>
	<1253616689.9339.1@edgar-matzingers-powerbook-g4-15.local>
Message-ID: <1253618627.13252.12.camel@marc>

Hello Edgar,


CS is needed because it contains the service (cman) that configures the
participation of nodes in your cluster and also enables your nodes to
communicate about changes they want to do an the shared filesystem.
Also it contains the method to remove failing nodes from participating
in your cluster. This all is mandatory for a cluster filesystem.

Please go ahead and read the available documentation if you have further
interests in it here: http://sourceware.org/cluster

Hope this helps,

Marc

Am Dienstag, den 22.09.2009, 12:51 +0200 schrieb Edgar Matzinger:
--8<--
> what's the point in that? What I want to do is to create 10 GFS nodes
> all using the same iSCSI volume. Why would I use CS? Oh, on each node
> runs a POP3 service.
> 
> Thanks, cu l8r, Edgar.



From gwood at dragonhold.org  Tue Sep 22 11:24:09 2009
From: gwood at dragonhold.org (Graham Wood)
Date: Tue, 22 Sep 2009 12:24:09 +0100
Subject: [Linux-cluster] GFS without CS?
In-Reply-To: <1253616689.9339.1@edgar-matzingers-powerbook-g4-15.local>
References: <1253607916.9339.0@edgar-matzingers-powerbook-g4-15.local>
	<8786b91c0909220130v777ac191q3893f2d3024ed3a4@mail.gmail.com>
	<1253610206.2804.3.camel@localhost.localdomain>
	<1253616689.9339.1@edgar-matzingers-powerbook-g4-15.local>
Message-ID: <20090922112409.GA17549@dragonhold.org>

On Tue, Sep 22, 2009 at 12:51:29PM +0200, Edgar Matzinger wrote:
> Why would I use CS?
To get things like locking and coherancy.

This is a very common question, and the answer is the same every time.  If you want to have 
multiple nodes accessing the same filesystem, there needs to be the synchronization between 
them to prevent corruption.

This is handled by the cluster framework - so, effectively, to use a clustering filesystem 
you need to be running a cluster.

TANSTAFL (for the Heinlein fans)

Graham



From linux-cluster at lists.grepular.com  Tue Sep 22 11:53:38 2009
From: linux-cluster at lists.grepular.com (Mike Cardwell)
Date: Tue, 22 Sep 2009 12:53:38 +0100
Subject: [Linux-cluster] GFS without CS?
In-Reply-To: <1253616689.9339.1@edgar-matzingers-powerbook-g4-15.local>
References: <1253607916.9339.0@edgar-matzingers-powerbook-g4-15.local>	<8786b91c0909220130v777ac191q3893f2d3024ed3a4@mail.gmail.com>	<1253610206.2804.3.camel@localhost.localdomain>
	<1253616689.9339.1@edgar-matzingers-powerbook-g4-15.local>
Message-ID: <4AB8BAC2.4040800@lists.grepular.com>

Edgar Matzinger wrote:

>> Provided you only use single node filesystems,
>>
>> Steve.
> 
> what's the point in that? What I want to do is to create 10 GFS nodes
> all using the same iSCSI volume. Why would I use CS? Oh, on each node
> runs a POP3 service.
> 
> Thanks, cu l8r, Edgar.

Then yes, you have to use cluster suite.

-- 
Mike Cardwell - IT Consultant and LAMP developer
Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/



From edgar at edgar-matzinger.nl  Tue Sep 22 13:28:36 2009
From: edgar at edgar-matzinger.nl (Edgar Matzinger)
Date: Tue, 22 Sep 2009 15:28:36 +0200
Subject: [Linux-cluster] GFS without CS?
In-Reply-To: <1253618627.13252.12.camel@marc> (from mad@wol.de on Tue Sep 22
	13:23:47 2009)
Message-ID: <1253626116.9339.2@edgar-matzingers-powerbook-g4-15.local>

Hi Marc,

On 09/22/09 13:23:47, Marc - A. Dahlhaus wrote:
> Hello Edgar,
> 
> 
> CS is needed because it contains the service (cman) that configures
> the
> participation of nodes in your cluster and also enables your nodes to
> communicate about changes they want to do an the shared filesystem.

Then GFS is not a cluster filesystem. It is just a filesystem.
It was my perception that GFS was a stand alone application.
Why? Because GFS supports up to 300 nodes (found in Oracle 9i RAC and
the Red Hat Global File System). And CS "only" 32
(http://www.redhat.com/cluster_suite/). Thinking about it: maybe
300 nodes mean something different. Oracle RAC does the replication
and not GFS....


> Also it contains the method to remove failing nodes from 
> participating
> in your cluster. This all is mandatory for a cluster filesystem.
> 

No, it's mandatory for a cluster. GFS itself should embed the changes
to files, directories, etc. in it's protocol used between the nodes.

> Please go ahead and read the available documentation if you have
> further
> interests in it here: http://sourceware.org/cluster
> 

OK, will take a look, cu l8r, Edgar.
-- 
                                                        ''~``
                                                       ( o o )
+-------------------------------------------------.oooO--(_)--Oooo.---+




From swhiteho at redhat.com  Tue Sep 22 13:39:41 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 22 Sep 2009 14:39:41 +0100
Subject: [Linux-cluster] GFS without CS?
In-Reply-To: <1253626116.9339.2@edgar-matzingers-powerbook-g4-15.local>
References: <1253626116.9339.2@edgar-matzingers-powerbook-g4-15.local>
Message-ID: <1253626781.2804.8.camel@localhost.localdomain>

Hi,

On Tue, 2009-09-22 at 15:28 +0200, Edgar Matzinger wrote:
> Hi Marc,
> 
> On 09/22/09 13:23:47, Marc - A. Dahlhaus wrote:
> > Hello Edgar,
> > 
> > 
> > CS is needed because it contains the service (cman) that configures
> > the
> > participation of nodes in your cluster and also enables your nodes to
> > communicate about changes they want to do an the shared filesystem.
> 
> Then GFS is not a cluster filesystem. It is just a filesystem.
> It was my perception that GFS was a stand alone application.
> Why? Because GFS supports up to 300 nodes (found in Oracle 9i RAC and
> the Red Hat Global File System). And CS "only" 32
> (http://www.redhat.com/cluster_suite/). Thinking about it: maybe
> 300 nodes mean something different. Oracle RAC does the replication
> and not GFS....
> 
I think there is some confusion here... the limit on the number of nodes
supported (currently) depends on cluster suite. Also, the number of
nodes supported is a different thing from the theoretical total number
of nodes. Larger configurations may well work, its just that we don't
regularly test such configurations.

> 
> > Also it contains the method to remove failing nodes from 
> > participating
> > in your cluster. This all is mandatory for a cluster filesystem.
> > 
> 
> No, it's mandatory for a cluster. GFS itself should embed the changes
> to files, directories, etc. in it's protocol used between the nodes.
> 
The cluster suite provides part of that protocol. It also provides
safeguards (i.e. fencing) for operations which would otherwise risk
corrupting the filesystem.

> > Please go ahead and read the available documentation if you have
> > further
> > interests in it here: http://sourceware.org/cluster
> > 
> 
> OK, will take a look, cu l8r, Edgar.

Steve.




From edsonmarquezani at gmail.com  Wed Sep 23 14:07:39 2009
From: edsonmarquezani at gmail.com (Edson Marquezani Filho)
Date: Wed, 23 Sep 2009 11:07:39 -0300
Subject: [Linux-cluster] Fencing APC No-breaks
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>
References: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>
Message-ID: <2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>

On Mon, Sep 14, 2009 at 11:17, Jeff Sturm <jeff.sturm at eprize.com> wrote:
> For APC products, we are familiar with both the fence_apc and
> fence_apc_snmp scripts. ?No daemon is required for either, but some
> setup in the APC management console may be needed.
>
> The fence_apc script creates an interactive telnet session and attempts
> to use the text-based menu interface. ?We've found it to be brittle and
> hard to support--the SNMP agent is newer and more straightforward.
>

I'm sorry, I had to take a pause on that question about no-breaks.

But, I didn't understand how to connect them to network. They have
only a serial and an USB interface, and I have not found any
information about this on documentation.

I would be thankful if somebody could tell me how this actually works.

Thanks.



From jeff.sturm at eprize.com  Wed Sep 23 15:52:50 2009
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Wed, 23 Sep 2009 11:52:50 -0400
Subject: [Linux-cluster] Fencing APC No-breaks
In-Reply-To: <2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>
References: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com><64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>
	<2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>
Message-ID: <64D0546C5EBBD147B75DE133D798665F03F3EB15@hugo.eprize.local>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Edson Marquezani Filho
> Sent: Wednesday, September 23, 2009 10:08 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Fencing APC No-breaks
> 
> I'm sorry, I had to take a pause on that question about no-breaks.
> 
> But, I didn't understand how to connect them to network. They have
> only a serial and an USB interface, and I have not found any
> information about this on documentation.

Oh, I see.   The model of PDU we have has onboard Ethernet, which is
what we use for the fencing interface.

Sorry to say that in your case I cannot help much.  It may be possible
to adapt the telnet-based agent to use a serial device, but I have no
hardware quite like yours so I cannot say whether it will work.

You also may be able to connect the APC to a console server such as the
Avocent (Cyclades) ACS, which has both Ethernet and serial interfaces,
and thereby expose the serial interface for telnet access.

-Jeff





From songyu555 at gmail.com  Thu Sep 24 08:04:55 2009
From: songyu555 at gmail.com (yu song)
Date: Thu, 24 Sep 2009 18:04:55 +1000
Subject: [Linux-cluster] rm -r on gfs2 filesystem is very slow
In-Reply-To: <1247241416.3384.42.camel@localhost.localdomain>
References: <c8c8ded30907100742y21661bd3s977e61c996fe8a11@mail.gmail.com>
	<1247239627.3384.38.camel@localhost.localdomain>
	<c8c8ded30907100849m3f541939w1a2768fab95faf5a@mail.gmail.com>
	<1247241416.3384.42.camel@localhost.localdomain>
Message-ID: <420241f50909240104w6d8e15a8q9053d66024d88678@mail.gmail.com>

HI,

I have been seeing a similar GFS2 performance issue.

when I did a file copy (4.3G) from a node to another node's gfs2 filesystem,
it shows me *793.4KB/s* . however, same file copy to the same node but non
gfs2 file system, it gives me *53.6MB/s*.

not too sure why? it seems like gfs2 has a performance issue here.

appreciate if anyone can tell me what I can tune to get it better.


Yu

On Sat, Jul 11, 2009 at 1:56 AM, Steven Whitehouse <swhiteho at redhat.com>wrote:

> Hi,
>
> On Fri, 2009-07-10 at 08:49 -0700, Peter Schobel wrote:
> > The initial writing is done via the network by checking out source
> > trees from a Perforce repository. Beyond that, source trees are
> > compiled causing the creation of many object files.
> >
> > Multiple source trees will be compiled from the same node or from
> > multiple nodes.
> >
> > This performance problem exhibits itself even when using a single
> > node. Writing to the filesystem seems to work fine. The time to do a
> > cp -r dir /gfs/dir is very comparable to writing to local disk
> > however, rm -r /gfs/dir takes considerably longer than it does on
> > local disk. I am guessing this is a feature of dlm checking for a lock
> > on each individual file but I'm not sure.
> >
> > Peter
>
> Partly that is the case. There are some things which can be done to
> improve performance in the deallocation area, and so that is likely to
> improve in future. The main issue is to ensure that we continue to
> maintain the correct locking order in that code. It can be complex since
> it involves the inode lock, transaction lock, and (maybe) multiple
> resource group locks,
>
> Steve.
>
> > ~
> >
> > On Fri, Jul 10, 2009 at 8:27 AM, Steven Whitehouse<swhiteho at redhat.com>
> wrote:
> > > Hi,
> > >
> > > On Fri, 2009-07-10 at 07:42 -0700, Peter Schobel wrote:
> > >> When we did our initial proof of concept, we did not notice any
> > >> performance problem of this magnitude. We were using OS release 2. Our
> > >> QA engineers passed approval on the performance stats of the gfs2
> > >> filesystem and now that we are in deployment phase they are calling it
> > >> unusable.
> > >>
> > >> Have there been any recent software changes that could have caused
> > >> degraded performance or something I may have missed in configuration?
> > >> Are there any tunable parameters in gfs2 that may increase our
> > >> performance?
> > >>
> > > Not that I'm aware of. There are no tunable parameters which might
> > > affect this particular aspect of performance, but to be clear exactly
> > > what the issue is, let me ask a few questions...
> > >
> > >> Our application is very write intensive. Basically we are compiling a
> > >> source tree and running a make clean between builds.
> > >>
> > >> Thanks in advance,
> > >>
> > >> Peter
> > >> ~
> > >>
> > > What is the nature of the writes? Are the different nodes writing into
> > > different directories in the main?
> > >
> > > GFS2 is pretty good at large directories, given certain conditions.
> Look
> > > ups should be pretty fast. Once there is a writer into a particular
> > > directory, then ideally one would take care not to read or write that
> > > directory from other nodes until the writer is finished.
> > >
> > > Directory listing of large directories can be slow, and counts as
> > > reading the directory from a caching point of view. Look ups of
> > > individual files should be fast though,
> > >
> > > Steve.
> > >
> > >
> > >> On Wed, Jul 08, 2009 at 01:58:30PM -0700, Peter Schobel wrote:
> > >>
> > >> >> I am trying to set up a four node cluster but am getting very poor
> > >> >> performance when removing large directories. A directory
> approximately
> > >> >> 1.6G  in size takes around 5 mins to remove from the gfs2
> filesystem
> > >> >> but removes in around 10 seconds from the local disk.
> > >> >>
> > >> >> I am using CentOS 5.3 with kernel 2.6.18-128.1.16.el5PAE.
> > >> >>
> > >> >> The filesystem was formatted in the following manner: mkfs.gfs2 -t
> > >> >> wtl_build:dev_home00 -p lock_dlm -j 10
> > >> >> /dev/mapper/VolGroupGFS-LogVolDevHome00 and is being mounted with
> the
> > >> >> following options: _netdev,noatime,defaults.
> > >> >
> > >> > This is something you have to live with.  GFS(2) works great, but
> with
> > >> > large(r) directories performance is extremely bad and for many
> > >> > applications a real show-stopper.
> > >> >
> > >> > There have been many discussions on this list, with GFS parameter
> tuning
> > >> > suggestions that at least for me didn't result in any improvements,
> with
> > >> > promises that the problems would be solved in GFS2 (I see no
> significant
> > >> > performance improvements between GFS and GFS2), etc.
> > >>
> > >> > --
> > >> > --    Jos Vos <jos at xos.nl>
> > >> > --    X/OS Experts in Open Systems BV   |   Phone: +31 20 6938364
> > >> > --    Amsterdam, The Netherlands        |     Fax: +31 20 6948204
> > >>
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090924/0528bd8a/attachment.htm>

From jfriesse at redhat.com  Thu Sep 24 08:34:38 2009
From: jfriesse at redhat.com (Jan Friesse)
Date: Thu, 24 Sep 2009 10:34:38 +0200
Subject: [Linux-cluster] Fencing APC No-breaks
In-Reply-To: <2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>
References: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>	<64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>
	<2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>
Message-ID: <4ABB2F1E.3080705@redhat.com>

Edson,
APC fence agent supports only network based APC PDU. Didn't you have
UPS? (from description it looks so)

Regards,
  Honza

Edson Marquezani Filho wrote:
> On Mon, Sep 14, 2009 at 11:17, Jeff Sturm <jeff.sturm at eprize.com> wrote:
>> For APC products, we are familiar with both the fence_apc and
>> fence_apc_snmp scripts.  No daemon is required for either, but some
>> setup in the APC management console may be needed.
>>
>> The fence_apc script creates an interactive telnet session and attempts
>> to use the text-based menu interface.  We've found it to be brittle and
>> hard to support--the SNMP agent is newer and more straightforward.
>>
> 
> I'm sorry, I had to take a pause on that question about no-breaks.
> 
> But, I didn't understand how to connect them to network. They have
> only a serial and an USB interface, and I have not found any
> information about this on documentation.
> 
> I would be thankful if somebody could tell me how this actually works.
> 
> Thanks.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From swhiteho at redhat.com  Thu Sep 24 08:46:39 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 24 Sep 2009 09:46:39 +0100
Subject: [Linux-cluster] rm -r on gfs2 filesystem is very slow
In-Reply-To: <420241f50909240104w6d8e15a8q9053d66024d88678@mail.gmail.com>
References: <c8c8ded30907100742y21661bd3s977e61c996fe8a11@mail.gmail.com>
	<1247239627.3384.38.camel@localhost.localdomain>
	<c8c8ded30907100849m3f541939w1a2768fab95faf5a@mail.gmail.com>
	<1247241416.3384.42.camel@localhost.localdomain>
	<420241f50909240104w6d8e15a8q9053d66024d88678@mail.gmail.com>
Message-ID: <1253781999.2760.8.camel@localhost.localdomain>

Hi,

On Thu, 2009-09-24 at 18:04 +1000, yu song wrote:
> HI,
> 
> I have been seeing a similar GFS2 performance issue.
> 
> when I did a file copy (4.3G) from a node to another node's gfs2
> filesystem, it shows me 793.4KB/s . however, same file copy to the
> same node but non gfs2 file system, it gives me 53.6MB/s.
> 
> not too sure why? it seems like gfs2 has a performance issue here.
> 
> appreciate if anyone can tell me what I can tune to get it better.
> 
What mount options are you using? I assume from the above that you are
talking about just one large file. Can you strace it on the destination
node and find out what size the write requests are? What is the
blocksize of the filesystem? What storage is the GFS2 filesystem mounted
on? (I presume its the same as the storage for your comparison
filesystem?)

Steve.

> 
> Yu
> 
> On Sat, Jul 11, 2009 at 1:56 AM, Steven Whitehouse
> <swhiteho at redhat.com> wrote:
>         Hi,
>         
>         On Fri, 2009-07-10 at 08:49 -0700, Peter Schobel wrote:
>         > The initial writing is done via the network by checking out
>         source
>         > trees from a Perforce repository. Beyond that, source trees
>         are
>         > compiled causing the creation of many object files.
>         >
>         > Multiple source trees will be compiled from the same node or
>         from
>         > multiple nodes.
>         >
>         > This performance problem exhibits itself even when using a
>         single
>         > node. Writing to the filesystem seems to work fine. The time
>         to do a
>         > cp -r dir /gfs/dir is very comparable to writing to local
>         disk
>         > however, rm -r /gfs/dir takes considerably longer than it
>         does on
>         > local disk. I am guessing this is a feature of dlm checking
>         for a lock
>         > on each individual file but I'm not sure.
>         >
>         > Peter
>         
>         
>         Partly that is the case. There are some things which can be
>         done to
>         improve performance in the deallocation area, and so that is
>         likely to
>         improve in future. The main issue is to ensure that we
>         continue to
>         maintain the correct locking order in that code. It can be
>         complex since
>         it involves the inode lock, transaction lock, and (maybe)
>         multiple
>         resource group locks,
>         
>         Steve.
>         
>         
>         > ~
>         >
>         > On Fri, Jul 10, 2009 at 8:27 AM, Steven
>         Whitehouse<swhiteho at redhat.com> wrote:
>         > > Hi,
>         > >
>         > > On Fri, 2009-07-10 at 07:42 -0700, Peter Schobel wrote:
>         > >> When we did our initial proof of concept, we did not
>         notice any
>         > >> performance problem of this magnitude. We were using OS
>         release 2. Our
>         > >> QA engineers passed approval on the performance stats of
>         the gfs2
>         > >> filesystem and now that we are in deployment phase they
>         are calling it
>         > >> unusable.
>         > >>
>         > >> Have there been any recent software changes that could
>         have caused
>         > >> degraded performance or something I may have missed in
>         configuration?
>         > >> Are there any tunable parameters in gfs2 that may
>         increase our
>         > >> performance?
>         > >>
>         > > Not that I'm aware of. There are no tunable parameters
>         which might
>         > > affect this particular aspect of performance, but to be
>         clear exactly
>         > > what the issue is, let me ask a few questions...
>         > >
>         > >> Our application is very write intensive. Basically we are
>         compiling a
>         > >> source tree and running a make clean between builds.
>         > >>
>         > >> Thanks in advance,
>         > >>
>         > >> Peter
>         > >> ~
>         > >>
>         > > What is the nature of the writes? Are the different nodes
>         writing into
>         > > different directories in the main?
>         > >
>         > > GFS2 is pretty good at large directories, given certain
>         conditions. Look
>         > > ups should be pretty fast. Once there is a writer into a
>         particular
>         > > directory, then ideally one would take care not to read or
>         write that
>         > > directory from other nodes until the writer is finished.
>         > >
>         > > Directory listing of large directories can be slow, and
>         counts as
>         > > reading the directory from a caching point of view. Look
>         ups of
>         > > individual files should be fast though,
>         > >
>         > > Steve.
>         > >
>         > >
>         > >> On Wed, Jul 08, 2009 at 01:58:30PM -0700, Peter Schobel
>         wrote:
>         > >>
>         > >> >> I am trying to set up a four node cluster but am
>         getting very poor
>         > >> >> performance when removing large directories. A
>         directory approximately
>         > >> >> 1.6G  in size takes around 5 mins to remove from the
>         gfs2 filesystem
>         > >> >> but removes in around 10 seconds from the local disk.
>         > >> >>
>         > >> >> I am using CentOS 5.3 with kernel
>         2.6.18-128.1.16.el5PAE.
>         > >> >>
>         > >> >> The filesystem was formatted in the following manner:
>         mkfs.gfs2 -t
>         > >> >> wtl_build:dev_home00 -p lock_dlm -j 10
>         > >> >> /dev/mapper/VolGroupGFS-LogVolDevHome00 and is being
>         mounted with the
>         > >> >> following options: _netdev,noatime,defaults.
>         > >> >
>         > >> > This is something you have to live with.  GFS(2) works
>         great, but with
>         > >> > large(r) directories performance is extremely bad and
>         for many
>         > >> > applications a real show-stopper.
>         > >> >
>         > >> > There have been many discussions on this list, with GFS
>         parameter tuning
>         > >> > suggestions that at least for me didn't result in any
>         improvements, with
>         > >> > promises that the problems would be solved in GFS2 (I
>         see no significant
>         > >> > performance improvements between GFS and GFS2), etc.
>         > >>
>         > >> > --
>         > >> > --    Jos Vos <jos at xos.nl>
>         > >> > --    X/OS Experts in Open Systems BV   |   Phone: +31
>         20 6938364
>         > >> > --    Amsterdam, The Netherlands        |     Fax: +31
>         20 6948204
>         > >>
>         > >
>         > > --
>         > > Linux-cluster mailing list
>         > > Linux-cluster at redhat.com
>         > > https://www.redhat.com/mailman/listinfo/linux-cluster
>         > >
>         >
>         
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com
>         https://www.redhat.com/mailman/listinfo/linux-cluster
>         
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From gavin.conway at uksolutions.co.uk  Thu Sep 24 10:30:38 2009
From: gavin.conway at uksolutions.co.uk (Gavin Conway)
Date: Thu, 24 Sep 2009 11:30:38 +0100
Subject: [Linux-cluster] GFS2 and D state HTTPD processes
Message-ID: <F547661C26C5364D9FA0C481D395CCCE25A77F5BCC@apps-1.uks.local>

Hi All,


 We have 6 nodes running GFS2 under CentOS 5.3 all connecting via Cisco 2960G switches to an MD3000i with 8 x 146GB SAS 15K drives. These nodes run a PHP website pulling their PHP and images files from a GFS2 volume being exported by iSCSI from the MD3000i .

Problem we have is that since inception we've seen issues whereby the HTTPD processes will go into a state of 'D', zombied' and the only way we have to recover from that is to restart all the nodes in the cluster.

I've tuned the demote_secs down from 300 to 20 seconds on the assumption that file locking is causing an issue. Similarly we're running with the following GFS values;

        <gfs_controld plock_ownership="1" plock_rate_limit="0"/>

Can anyone give me some pointers on what we should be investigating for why this is failing? I've had our networks team crawl over the networking and that all seems fine. The MTU is set correctly on the MD3000i and on the individual nodes. I've also used the ping_pong tool and on a single file on the GFS cluster we can get around 90K locks on a file. If I run ping_pong against the same file from two nodes that then drops to around 70 locks per second. I don't think that's the issue though.

If anyone can provide some insight to either what to change, what to debug or how to investigate this further it'd be greatly appreciated.


Thanks
Gavin

Gavin Conway
Senior Engineer, Operations (Systems Group), UKSolutions

Telephone: 0845 004 1333, option 2
Email: gavin.conway at uksolutions.co.uk
Web: www.uksolutions.co.uk<http://www.uksolutions.co.uk/>
UKS Ltd, Birmingham Road, Studley, Warwickshire, B80 7BG Registered in England Number 3036806
This email must be read in conjunction with the legal & service notices on http://www.uksolutions.co.uk/disclaimer.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090924/39a268f6/attachment.htm>

From m.perugini at 4it.it  Thu Sep 24 11:24:54 2009
From: m.perugini at 4it.it (marco perugini)
Date: Thu, 24 Sep 2009 13:24:54 +0200
Subject: [Linux-cluster] fos - secondary heartbeat channel
In-Reply-To: <1253626781.2804.8.camel@localhost.localdomain>
References: <1253626116.9339.2@edgar-matzingers-powerbook-g4-15.local>
	<1253626781.2804.8.camel@localhost.localdomain>
Message-ID: <4ABB5706.6050706@4it.it>

hi list!
do you know if in fos configuration it's possible to send heartbeat signal on a secondary/backup interface?
more precisely i want to know if it's possible heartbeating on an interface that isn't involved in the vip process.
i'm looking for this issue but i can't find anything on the docs..
did anyone succeeded in it?
thanks and regards,
marco

p.s. i enclose my lvs config

serial_no = 29
service = fos
primary = 172.21.1.1
backup = 172.21.1.2
backup_active = 1
heartbeat = 1
heartbeat_port = 539
keepalive = 3
deadtime = 9
network = direct
debug_level = NONE
monitor_links = 1
failover radius_auth {
     active = 1
     address = 172.21.1.4 bond0:1
     vip_nmask = 255.255.255.0
     port = 1812
     send_program = "/usr/local/etc/testnovo.pl"
     expect = "OK"
     timeout = 5
     start_cmd = "/etc/init.d/radiusd start -xx"
     stop_cmd = "/etc/init.d/radiusd stop"
}


-- 

4it

	

*4IT **S.r.l.
**Marco Perugini* *| system administrator** *
---------------------------------------------------------
Via Udine 30-36, 00161 Roma
Phone +39 06 97601680
Mobile +39 339.39.81.246
Fax +39 06 97601683
m.perugini at 4it.it <mailto:m.perugini at 4it.it>
www.4it.it <http://www.4it.it/>

"Il presente messaggio e gli eventuali allegati sono di natura
confidenziale. Qualora vi fosse pervenuto per errore, vi preghiamo di
cancellarlo immediatamente dal vostro sistema e di avvisare il mittente.
Grazie."

"This electronic mail transmission and any accompanying attachments
contain confidential information. If you have received this
communication in error, please immediately delete the E-mail and either
notify the sender. Thank you."


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090924/e19e20db/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graphics1
Type: image/jpeg
Size: 2030 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090924/e19e20db/attachment.jpe>

From edsonmarquezani at gmail.com  Thu Sep 24 11:29:21 2009
From: edsonmarquezani at gmail.com (Edson Marquezani Filho)
Date: Thu, 24 Sep 2009 08:29:21 -0300
Subject: [Linux-cluster] Fencing APC No-breaks
In-Reply-To: <4ABB2F1E.3080705@redhat.com>
References: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>
	<2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>
	<4ABB2F1E.3080705@redhat.com>
Message-ID: <2fc5f090909240429o42ab7432ydd3eb37107e82bac@mail.gmail.com>

On Thu, Sep 24, 2009 at 05:34, Jan Friesse <jfriesse at redhat.com> wrote:
> Edson,
> APC fence agent supports only network based APC PDU. Didn't you have
> UPS? (from description it looks so)
>
> Regards,
> ?Honza
>

I have two of these[1], and nothing else.

[1] http://www.apc.com/products/resource/include/techspec_index.cfm?base_sku=SUA3000RMXL3U&total_watts=50



From edsonmarquezani at gmail.com  Thu Sep 24 14:39:27 2009
From: edsonmarquezani at gmail.com (Edson Marquezani Filho)
Date: Thu, 24 Sep 2009 11:39:27 -0300
Subject: [Linux-cluster] Fencing APC No-breaks
In-Reply-To: <2fc5f090909240429o42ab7432ydd3eb37107e82bac@mail.gmail.com>
References: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>
	<2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>
	<4ABB2F1E.3080705@redhat.com>
	<2fc5f090909240429o42ab7432ydd3eb37107e82bac@mail.gmail.com>
Message-ID: <2fc5f090909240739k149d7cdat86ea9e6d59a87c5@mail.gmail.com>

>
> I have two of these[1], and nothing else.
>
> [1] http://www.apc.com/products/resource/include/techspec_index.cfm?base_sku=SUA3000RMXL3U&total_watts=50
>

It seems like I need a card like that[1] to put it directly on network.
So, I'm afraid I will have to forget about nobreak fencing methods for now. =/

[1] http://www.bomara.com/APC/web-snmp-card.htm



From jfriesse at redhat.com  Fri Sep 25 07:55:39 2009
From: jfriesse at redhat.com (Jan Friesse)
Date: Fri, 25 Sep 2009 09:55:39 +0200
Subject: [Linux-cluster] Fencing APC No-breaks
In-Reply-To: <2fc5f090909240739k149d7cdat86ea9e6d59a87c5@mail.gmail.com>
References: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>	<64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>	<2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>	<4ABB2F1E.3080705@redhat.com>	<2fc5f090909240429o42ab7432ydd3eb37107e82bac@mail.gmail.com>
	<2fc5f090909240739k149d7cdat86ea9e6d59a87c5@mail.gmail.com>
Message-ID: <4ABC777B.3070903@redhat.com>

Edson,
so as I expect, you have UPS not PDU. Problem with that UPS is only one
(respectively two) communication connectors. Because EVERY node in
cluster must be able to fence EVERY other node and it's not possible to
achive this with 1 connector (ok, theoretically it is possible to have 2
cheap UPS, cross connected for 2 node cluster, but we don't support this)

Maybe that card helps, but I'm really not sure, about support in fence
agent. In case you will decide to buy it, and you will be able to turn
on/off plugs (you must be able to turn on/off only ONE plug, all of them
is useless) by web and SNMP, you can try fence_apc_snmp. In case it will
not work, you can try to send there SNMP dump and I should be able to
implement support to fence_apc_snmp.

Regards,
  Honza

Edson Marquezani Filho wrote:
>> I have two of these[1], and nothing else.
>>
>> [1] http://www.apc.com/products/resource/include/techspec_index.cfm?base_sku=SUA3000RMXL3U&total_watts=50
>>
> 
> It seems like I need a card like that[1] to put it directly on network.
> So, I'm afraid I will have to forget about nobreak fencing methods for now. =/
> 
> [1] http://www.bomara.com/APC/web-snmp-card.htm
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From swhiteho at redhat.com  Fri Sep 25 08:46:34 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 25 Sep 2009 09:46:34 +0100
Subject: [Linux-cluster] GFS2 and D state HTTPD processes
In-Reply-To: <F547661C26C5364D9FA0C481D395CCCE25A77F5BCC@apps-1.uks.local>
References: <F547661C26C5364D9FA0C481D395CCCE25A77F5BCC@apps-1.uks.local>
Message-ID: <1253868394.2713.5.camel@localhost.localdomain>

Hi,

On Thu, 2009-09-24 at 11:30 +0100, Gavin Conway wrote:
> Hi All,
> 
>  
> 
>  
> 
>  We have 6 nodes running GFS2 under CentOS 5.3 all connecting via
> Cisco 2960G switches to an MD3000i with 8 x 146GB SAS 15K drives.
> These nodes run a PHP website pulling their PHP and images files from
> a GFS2 volume being exported by iSCSI from the MD3000i .
> 
>  
> 
> Problem we have is that since inception we?ve seen issues whereby the
> HTTPD processes will go into a state of ?D?, zombied? and the only way
> we have to recover from that is to restart all the nodes in the
> cluster.
> 
>  
> 
> I?ve tuned the demote_secs down from 300 to 20 seconds on the
> assumption that file locking is causing an issue.
That is unlikely to make any meaningful change and in fact it could well
hurt performance, depending on the workload.

>  
>
> Similarly we?re running with the following GFS values;
> 
>  
> 
>         <gfs_controld plock_ownership="1" plock_rate_limit="0"/>
> 
Try turning off plock_ownership and see if that fixes the problem

>  
> 
> Can anyone give me some pointers on what we should be investigating
> for why this is failing? I?ve had our networks team crawl over the
> networking and that all seems fine. The MTU is set correctly on the
> MD3000i and on the individual nodes. I?ve also used the ping_pong tool
> and on a single file on the GFS cluster we can get around 90K locks on
> a file. If I run ping_pong against the same file from two nodes that
> then drops to around 70 locks per second. I don?t think that?s the
> issue though.
> 
>  
> 
> If anyone can provide some insight to either what to change, what to
> debug or how to investigate this further it?d be greatly appreciated.
> 
>  
There are two things to look at. One is back traces from processes (echo
't' > proc/sysrq-trigger) and the other is the glock dump
from /sys/kernel/debug/fs/gfs2/glocks. The first tells us what is
hanging and the second (hopefully) why. Look for glocks with 'W' in the
flags field (f:) for their holders (H:) and it should be possible to
correlate them with the processes which are stuck.

Do you get any messages in the syslog?

Steve.

> 
>  
> 
> Thanks
> Gavin
> 
> 
>  
> 
> Gavin Conway
> 
> Senior Engineer, Operations (Systems Group), UKSolutions
> 
>  
> 
> Telephone: 0845 004 1333, option 2
> 
> Email: gavin.conway at uksolutions.co.uk
> 
> Web: www.uksolutions.co.uk
> 
> UKS Ltd, Birmingham Road, Studley, Warwickshire, B80 7BG Registered in
> England Number 3036806
> 
> This email must be read in conjunction with the legal & service
> notices on http://www.uksolutions.co.uk/disclaimer.html
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From gavin.conway at uksolutions.co.uk  Fri Sep 25 11:08:17 2009
From: gavin.conway at uksolutions.co.uk (Gavin Conway)
Date: Fri, 25 Sep 2009 12:08:17 +0100
Subject: [Linux-cluster] GFS2 and D state HTTPD processes
In-Reply-To: <1253868394.2713.5.camel@localhost.localdomain>
References: <F547661C26C5364D9FA0C481D395CCCE25A77F5BCC@apps-1.uks.local>
	<1253868394.2713.5.camel@localhost.localdomain>
Message-ID: <F547661C26C5364D9FA0C481D395CCCE25AAFB802F@apps-1.uks.local>

Hi Steve,

> > I?ve tuned the demote_secs down from 300 to 20 seconds on the
> > assumption that file locking is causing an issue.
> That is unlikely to make any meaningful change and in fact it could well
> hurt performance, depending on the workload.


> >         <gfs_controld plock_ownership="1" plock_rate_limit="0"/>
> >
> Try turning off plock_ownership and see if that fixes the problem

We'll give this a go and see what it does. We did manage to track down the latest issue to a bad script that the customer had written which caused one of the nodes to exhaust all of its available memory. That then caused a knock-on effect to the lock_dlm process which was unable to drop it's file locks, which then rolled the affect on to the rest of the cluster as they started being unable to open files.

> There are two things to look at. One is back traces from processes (echo
> 't' > proc/sysrq-trigger) and the other is the glock dump
> from /sys/kernel/debug/fs/gfs2/glocks. The first tells us what is
> hanging and the second (hopefully) why. Look for glocks with 'W' in the
> flags field (f:) for their holders (H:) and it should be possible to
> correlate them with the processes which are stuck.

Thanks for the above, that's really useful

> Do you get any messages in the syslog?

Sadly not.

I'm just looking at this page;

http://manpages.ubuntu.com/manpages/karmic/man8/gfs_controld.8.html

and for a webserver, or a group of webservers, with a large amount of files comprising the website itself is it worth increasing the drop_resources_time value so that file locks are flushed faster?


Thanks
Gavin


Gavin Conway
Senior Engineer, Operations (Systems Group), UKSolutions

Telephone: 0845 004 1333, option 2
Email: gavin.conway at uksolutions.co.uk
Web: http://www.uksolutions.co.uk/
UKS Ltd, Birmingham Road, Studley, Warwickshire, B80 7BG Registered in England Number 3036806
This email must be read in conjunction with the legal & service notices on http://www.uksolutions.co.uk/disclaimer.html



From fdinitto at redhat.com  Fri Sep 25 12:03:46 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 25 Sep 2009 14:03:46 +0200
Subject: [Linux-cluster] Cluster 3.0.3 stable release
Message-ID: <1253880226.13756.21.camel@cerberus.int.fabbione.net>

The cluster team and its community are proud to announce the 3.0.2
stable release from the STABLE3 branch.

This release contains a few major bug fixes. We strongly recommend
people to update your clusters.

In order to build the 3.0.2 release you will need:

- corosync 1.1.0
- openais 1.1.0
- linux kernel 2.6.31

The new source tarball can be downloaded here:

ftp://sources.redhat.com/pub/cluster/releases/cluster-3.0.3.tar.gz
https://fedorahosted.org/releases/c/l/cluster/cluster-3.0.3.tar.gz

At the same location is now possible to find separated tarballs for
fence-agents and resource-agents as previously announced
(http://www.redhat.com/archives/cluster-devel/2009-February/msg00003.htm)

Together with fence-agents and resource-agents, it's now possible to
download also rgmanager and gfs-utils as separate tarballs.

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

Happy clustering,
Fabio

Under the hood (from 3.0.2):

Abhijith Das (1):
      gfs-kernel: Bug 471258 -  fatal: assertion
"gfs_glock_is_locked_by_me(gl) && gfs_glock_is_held_excl(gl)" failed

Andrew Beekhof (1):
      Add most of the needed pieces for gfs2-pacemaker integration

Bob Peterson (12):
      Allow gfs2_edit printsavedmeta to print destination size and type
      Make gfs2_edit -p <block> blockalloc work for gfs1 file systems
      Allow gfs2_edit to display and print gfs1 rgrps
      gfs2_edit: Indirect pointers missing from list when paging up and
down
      gfs2_edit: Add missing superblock fields for gfs1 file systems
      gfs2_edit: Fix rindex read function for gfs1 file systems
      GFS2: gfs2_edit prints wrong directory entry type for gfs1
      gfs2_edit -p block# shows wrong height/offset on gfs1 and
segfaults on gfs2
      gfs_fsck cannot repair rindex problems when directly on
blockdevice
      GFS2 filesystem inconsistent after xfstests test suite run
      GFS filesystem inconsistent after xfstests test suite run
      fsck.gfs2 unable to fix some rindex corruption for block size < 4K

Christine Caulfield (7):
      config: Change has_childs to has_children
      cman: add some explanation to cman_tool man page
      config: Fix some schema names in confdb2ldif
      config: remove translation table from confdb2ldif
      config: Remove superflous empty objects created by LDAP loader.
      config: in LDAP cluster entry is not a cn=
      cman: Allow re-registering of a quorum disk

David Teigland (10):
      gfs_controld: include mg name prefix in log messages
      dlm_controld: periodically drop plock resources
      fenced: partition merge handling
      fenced: limit repeating failure messages
      fenced: rename things from cman to cluster
      dlm_controld: copy some fenced changes
      gfs_controld: copy some fenced changes
      dlm_controld: log_debug to log_group
      dlm_controld: fix start matching for partition+merge changes
      dlm_controld: change default enable_quorum to 0

Fabio M. Di Nitto (20):
      build: cleanup libfenced linking
      build: update release script
      cman init: integrate ocfs2_controld.cman startup
      config: preliminary support for config validation
      config validation: improve shell wrapper
      config validation: allow users to specify an alternate config file
      config validation: fix override
      config dump: beautify output
      config dump: add usage and options
      config dump: add man page
      ccs_config_validate: give the tool a decent shape
      config: cleanup rng2ldif build
      config validation: clean up last bits
      config validate: add man page
      dlm_controld: allow parallel build of pcmk variant
      release script: rename gfs1-utils to gfs-utils
      cman init: allow sshd to start before cman without hard dependency
      Revert "gfs-kernel: update to 2.6.30"
      gfs-kernel: update to 2.6.31
      cman init: implement startup block breakpoints

Federico Simoncelli (1):
      rgmanager: Handle virsh migration errors gracefully

Jan Friesse (1):
      fence: Support for power cycle in fence ipmi

Lars Marowsky-Bree (1):
      dlm_controld: include ls name prefix in log messages

Lon Hohberger (15):
      rgmanager: Fix missing path attribute handling
      Add nodename attr to cman for automatic validation
      config: Add data type checkers for some attrs
      config: Fix up LDAP schema
      rgmanager: fix build warning due to unused variable
      config: Add rng2ldif utility for LDIF maintenance
      config: Record time/date rather than path
      config: Add 'cn' to LDIF when needed
      config: Update LDIF to include cn when needed
      config: Make 'cn' output conditional
      config: Update LDIF to remove 'cn' where not applicable
      rgmanager: Merge resource schema generation bits
      config: Add missing stuff to schema
      rgmanager: Ensure 'make clean' kills resources.rng
      config: Update LDIF schema

Marek 'marx' Grac (2):
      fence_apc: fence_apc fails to fence node - Unable to obtain plug
status
      fence_drac: Split fence agents drac/drac5 to separate directories

Steven Whitehouse (1):
      gfs: Make gfs build with newer kernels

 cman/daemon/commands.c                     |   55 +-
 cman/init.d/cman.in                        |   46 +-
 cman/man/cman_tool.8                       |    5 +
 config/plugins/ldap/99cluster.ldif         |  490 +++---
 config/plugins/ldap/configldap.c           |    7 +-
 config/plugins/ldap/ldap-base.csv          |    8 +-
 config/tools/ldap/Makefile                 |    3 +
 config/tools/ldap/confdb2ldif.c            |   30 +-
 config/tools/ldap/rng2ldif/Makefile        |   46 +
 config/tools/ldap/rng2ldif/debug.h         |   10 +
 config/tools/ldap/rng2ldif/genclass.c      |  102 +
 config/tools/ldap/rng2ldif/ldaptypes.c     |   53 +
 config/tools/ldap/rng2ldif/ldaptypes.h     |    8 +
 config/tools/ldap/rng2ldif/name.c          |   58 +
 config/tools/ldap/rng2ldif/name.h          |    6 +
 config/tools/ldap/rng2ldif/rng2ldif.c      |  232 +++
 config/tools/ldap/rng2ldif/tree.c          |  363 ++++
 config/tools/ldap/rng2ldif/tree.h          |   40 +
 config/tools/ldap/rng2ldif/value-list.c    |  192 ++
 config/tools/ldap/rng2ldif/value-list.h    |   29 +
 config/tools/ldap/rng2ldif/zalloc.c        |   23 +
 config/tools/ldap/rng2ldif/zalloc.h        |    6 +
 config/tools/man/Makefile                  |    4 +-
 config/tools/man/ccs_config_dump.8         |   35 +
 config/tools/man/ccs_config_validate.8     |   52 +
 config/tools/xml/Makefile                  |   40 +-
 config/tools/xml/ccs_config_dump.c         |  158 ++
 config/tools/xml/ccs_config_validate.in    |  169 ++
 config/tools/xml/cluster.rng               | 2806
---------------------------
 config/tools/xml/cluster.rng.in            | 2845
++++++++++++++++++++++++++++
 configure                                  |   16 +-
 fence/agents/apc/fence_apc.py              |    2 +-
 fence/agents/drac/Makefile                 |    2 +-
 fence/agents/drac/fence_drac5.py           |  131 --
 fence/agents/drac5/Makefile                |    5 +
 fence/agents/drac5/fence_drac5.py          |  131 ++
 fence/agents/ipmilan/ipmilan.c             |   85 +-
 fence/fence_node/Makefile                  |   12 +-
 fence/fence_tool/Makefile                  |   10 +-
 fence/fenced/Makefile                      |    9 +-
 fence/fenced/cpg.c                         |  466 ++++-
 fence/fenced/fd.h                          |   26 +-
 fence/fenced/group.c                       |    6 +-
 fence/fenced/main.c                        |   33 +-
 fence/fenced/member_cman.c                 |  111 +-
 fence/fenced/recover.c                     |   88 +-
 fence/man/fence_ipmilan.8                  |   10 +
 gfs-kernel/src/gfs/ops_file.c              |    2 +-
 gfs-kernel/src/gfs/ops_fstype.c            |    4 +-
 gfs-kernel/src/gfs/ops_inode.c             |    2 +-
 gfs/gfs_fsck/fsck_incore.h                 |    1 +
 gfs/gfs_fsck/pass1.c                       |    1 -
 gfs/gfs_fsck/super.c                       |  252 ++-
 gfs2/convert/gfs2_convert.c                |    2 +-
 gfs2/edit/gfs2hex.c                        |   83 +-
 gfs2/edit/gfs2hex.h                        |    2 +-
 gfs2/edit/hexedit.c                        |  239 ++-
 gfs2/edit/hexedit.h                        |    3 +
 gfs2/edit/savemeta.c                       |   37 +-
 gfs2/fsck/pass1.c                          |    1 -
 gfs2/libgfs2/gfs1.c                        |    4 +-
 gfs2/libgfs2/libgfs2.h                     |    2 +-
 gfs2/libgfs2/super.c                       |    1 +
 group/Makefile                             |    6 +-
 group/dlm_controld/Makefile                |   57 +-
 group/dlm_controld/config.h                |    2 +-
 group/dlm_controld/cpg.c                   |  253 +++-
 group/dlm_controld/dlm_daemon.h            |   14 +-
 group/dlm_controld/main.c                  |   14 +-
 group/dlm_controld/member_cman.c           |   10 +-
 group/dlm_controld/pacemaker.c             |  356 ++--
 group/dlm_controld/plock.c                 |   37 +-
 group/gfs_controld/Makefile                |   46 +-
 group/gfs_controld/cpg-new.c               |  170 ++-
 group/gfs_controld/gfs_daemon.h            |   18 +-
 group/gfs_controld/main.c                  |   14 +-
 group/gfs_controld/member_cman.c           |    6 +-
 group/gfs_controld/member_pcmk.c           |   84 +
 group/tool/Makefile                        |    5 +-
 make/cobj.mk                               |    4 +
 make/defines.mk.input                      |    3 +-
 make/install.mk                            |    4 +
 make/release.mk                            |   81 +-
 make/uninstall.mk                          |    3 +
 rgmanager/src/daemons/groups.c             |    1 -
 rgmanager/src/resources/Makefile           |   24 +
 rgmanager/src/resources/ra2csv.xsl         |   19 -
 rgmanager/src/resources/ra2oid.xsl         |   68 -
 rgmanager/src/resources/ra2ref.xsl         |   10 +
 rgmanager/src/resources/ra2rng.xsl         |   43 +
 rgmanager/src/resources/resources.rng.head |    1 +
 rgmanager/src/resources/resources.rng.mid  |    4 +
 rgmanager/src/resources/resources.rng.tail |   32 +
 rgmanager/src/resources/vm.sh              |   14 +
 94 files changed, 7032 insertions(+), 4071 deletions(-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090925/67607025/attachment.sig>

From rpeterso at redhat.com  Fri Sep 25 12:26:47 2009
From: rpeterso at redhat.com (Bob Peterson)
Date: Fri, 25 Sep 2009 08:26:47 -0400 (EDT)
Subject: [Linux-cluster] GFS2 and D state HTTPD processes
In-Reply-To: <F547661C26C5364D9FA0C481D395CCCE25AAFB802F@apps-1.uks.local>
Message-ID: <175105638.612161253881607536.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>

----- "Gavin Conway" <gavin.conway at uksolutions.co.uk> wrote:
| We'll give this a go and see what it does. We did manage to track down
| the latest issue to a bad script that the customer had written which
| caused one of the nodes to exhaust all of its available memory. That
| then caused a knock-on effect to the lock_dlm process which was unable
| to drop it's file locks, which then rolled the affect on to the rest
| of the cluster as they started being unable to open files.

Hi Gavin,

You could also try my hang analyzer to see if it finds anything:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/gfs2_hangalyzer.c

Compile with: gcc -o gfs2_hangalyzer gfs2_hangalyzer.c

Run with: ./gfs2_hangalyzer -n <any node in the cluster>

This leaves a bunch of files in /tmp/ so you may want to clean them up.

But be forewarned that you should have rsa keys set up ahead of time
so you can ssh to all the nodes in your cluster without a password
before running this tool.

Regards,

Bob Peterson
Red Hat File Systems



From edsonmarquezani at gmail.com  Fri Sep 25 12:50:38 2009
From: edsonmarquezani at gmail.com (Edson Marquezani Filho)
Date: Fri, 25 Sep 2009 09:50:38 -0300
Subject: [Linux-cluster] Fencing APC No-breaks
In-Reply-To: <4ABC777B.3070903@redhat.com>
References: <2fc5f090909111318u18c1a723pcfb07c5e470e1692@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3E9E5@hugo.eprize.local>
	<2fc5f090909230707l83145a5m6ac976b87505d092@mail.gmail.com>
	<4ABB2F1E.3080705@redhat.com>
	<2fc5f090909240429o42ab7432ydd3eb37107e82bac@mail.gmail.com>
	<2fc5f090909240739k149d7cdat86ea9e6d59a87c5@mail.gmail.com>
	<4ABC777B.3070903@redhat.com>
Message-ID: <2fc5f090909250550r2461e62l51c8ef44375341cc@mail.gmail.com>

On Fri, Sep 25, 2009 at 04:55, Jan Friesse <jfriesse at redhat.com> wrote:
> Edson,
> so as I expect, you have UPS not PDU. Problem with that UPS is only one
> (respectively two) communication connectors. Because EVERY node in
> cluster must be able to fence EVERY other node and it's not possible to
> achive this with 1 connector (ok, theoretically it is possible to have 2
> cheap UPS, cross connected for 2 node cluster, but we don't support this)

Yes, that's exactly what I have here.

> Maybe that card helps, but I'm really not sure, about support in fence
> agent. In case you will decide to buy it, and you will be able to turn
> on/off plugs (you must be able to turn on/off only ONE plug, all of them
> is useless) by web and SNMP, you can try fence_apc_snmp. In case it will
> not work, you can try to send there SNMP dump and I should be able to
> implement support to fence_apc_snmp.

I thank you very much, Jan, but I decided to keep with only a single
fencing method. I think it will be enough to me, anyway.

(My original intention on using nobreak fencing methods was to avoid
cluster to stop due to power cables disconnection on a node, or
something else that cuts off its power completely, what would make iLO
interface unreachable, and the other node won't start the services
because it can't be sure that the other is down. But I realized that
it doesn't make sense, just because it is very unlikely to happen.)

> Regards,
> ?Honza



From teigland at redhat.com  Fri Sep 25 13:33:55 2009
From: teigland at redhat.com (David Teigland)
Date: Fri, 25 Sep 2009 08:33:55 -0500
Subject: [Linux-cluster] Re: [Cluster-devel] Cluster 3.0.3 stable release
In-Reply-To: <1253880226.13756.21.camel@cerberus.int.fabbione.net>
References: <1253880226.13756.21.camel@cerberus.int.fabbione.net>
Message-ID: <20090925133355.GA9544@redhat.com>

On Fri, Sep 25, 2009 at 02:03:46PM +0200, Fabio M. Di Nitto wrote:

the subject is correct, replace instances of 3.0.2 with 3.0.3 below

> The cluster team and its community are proud to announce the 3.0.2
> stable release from the STABLE3 branch.
> 
> This release contains a few major bug fixes. We strongly recommend
> people to update your clusters.
> 
> In order to build the 3.0.2 release you will need:
> 
> - corosync 1.1.0
> - openais 1.1.0
> - linux kernel 2.6.31
> 
> The new source tarball can be downloaded here:
> 
> ftp://sources.redhat.com/pub/cluster/releases/cluster-3.0.3.tar.gz
> https://fedorahosted.org/releases/c/l/cluster/cluster-3.0.3.tar.gz
> 
> At the same location is now possible to find separated tarballs for
> fence-agents and resource-agents as previously announced
> (http://www.redhat.com/archives/cluster-devel/2009-February/msg00003.htm)
> 
> Together with fence-agents and resource-agents, it's now possible to
> download also rgmanager and gfs-utils as separate tarballs.
> 
> To report bugs or issues:
> 
>    https://bugzilla.redhat.com/
> 
> Would you like to meet the cluster team or members of its community?
> 
>    Join us on IRC (irc.freenode.net #linux-cluster) and share your
>    experience  with other sysadministrators or power users.
> 
> Thanks/congratulations to all people that contributed to achieve this
> great milestone.
> 
> Happy clustering,
> Fabio
> 
> Under the hood (from 3.0.2):
> 
> Abhijith Das (1):
>       gfs-kernel: Bug 471258 -  fatal: assertion
> "gfs_glock_is_locked_by_me(gl) && gfs_glock_is_held_excl(gl)" failed
> 
> Andrew Beekhof (1):
>       Add most of the needed pieces for gfs2-pacemaker integration
> 
> Bob Peterson (12):
>       Allow gfs2_edit printsavedmeta to print destination size and type
>       Make gfs2_edit -p <block> blockalloc work for gfs1 file systems
>       Allow gfs2_edit to display and print gfs1 rgrps
>       gfs2_edit: Indirect pointers missing from list when paging up and
> down
>       gfs2_edit: Add missing superblock fields for gfs1 file systems
>       gfs2_edit: Fix rindex read function for gfs1 file systems
>       GFS2: gfs2_edit prints wrong directory entry type for gfs1
>       gfs2_edit -p block# shows wrong height/offset on gfs1 and
> segfaults on gfs2
>       gfs_fsck cannot repair rindex problems when directly on
> blockdevice
>       GFS2 filesystem inconsistent after xfstests test suite run
>       GFS filesystem inconsistent after xfstests test suite run
>       fsck.gfs2 unable to fix some rindex corruption for block size < 4K
> 
> Christine Caulfield (7):
>       config: Change has_childs to has_children
>       cman: add some explanation to cman_tool man page
>       config: Fix some schema names in confdb2ldif
>       config: remove translation table from confdb2ldif
>       config: Remove superflous empty objects created by LDAP loader.
>       config: in LDAP cluster entry is not a cn=
>       cman: Allow re-registering of a quorum disk
> 
> David Teigland (10):
>       gfs_controld: include mg name prefix in log messages
>       dlm_controld: periodically drop plock resources
>       fenced: partition merge handling
>       fenced: limit repeating failure messages
>       fenced: rename things from cman to cluster
>       dlm_controld: copy some fenced changes
>       gfs_controld: copy some fenced changes
>       dlm_controld: log_debug to log_group
>       dlm_controld: fix start matching for partition+merge changes
>       dlm_controld: change default enable_quorum to 0
> 
> Fabio M. Di Nitto (20):
>       build: cleanup libfenced linking
>       build: update release script
>       cman init: integrate ocfs2_controld.cman startup
>       config: preliminary support for config validation
>       config validation: improve shell wrapper
>       config validation: allow users to specify an alternate config file
>       config validation: fix override
>       config dump: beautify output
>       config dump: add usage and options
>       config dump: add man page
>       ccs_config_validate: give the tool a decent shape
>       config: cleanup rng2ldif build
>       config validation: clean up last bits
>       config validate: add man page
>       dlm_controld: allow parallel build of pcmk variant
>       release script: rename gfs1-utils to gfs-utils
>       cman init: allow sshd to start before cman without hard dependency
>       Revert "gfs-kernel: update to 2.6.30"
>       gfs-kernel: update to 2.6.31
>       cman init: implement startup block breakpoints
> 
> Federico Simoncelli (1):
>       rgmanager: Handle virsh migration errors gracefully
> 
> Jan Friesse (1):
>       fence: Support for power cycle in fence ipmi
> 
> Lars Marowsky-Bree (1):
>       dlm_controld: include ls name prefix in log messages
> 
> Lon Hohberger (15):
>       rgmanager: Fix missing path attribute handling
>       Add nodename attr to cman for automatic validation
>       config: Add data type checkers for some attrs
>       config: Fix up LDAP schema
>       rgmanager: fix build warning due to unused variable
>       config: Add rng2ldif utility for LDIF maintenance
>       config: Record time/date rather than path
>       config: Add 'cn' to LDIF when needed
>       config: Update LDIF to include cn when needed
>       config: Make 'cn' output conditional
>       config: Update LDIF to remove 'cn' where not applicable
>       rgmanager: Merge resource schema generation bits
>       config: Add missing stuff to schema
>       rgmanager: Ensure 'make clean' kills resources.rng
>       config: Update LDIF schema
> 
> Marek 'marx' Grac (2):
>       fence_apc: fence_apc fails to fence node - Unable to obtain plug
> status
>       fence_drac: Split fence agents drac/drac5 to separate directories
> 
> Steven Whitehouse (1):
>       gfs: Make gfs build with newer kernels
> 
>  cman/daemon/commands.c                     |   55 +-
>  cman/init.d/cman.in                        |   46 +-
>  cman/man/cman_tool.8                       |    5 +
>  config/plugins/ldap/99cluster.ldif         |  490 +++---
>  config/plugins/ldap/configldap.c           |    7 +-
>  config/plugins/ldap/ldap-base.csv          |    8 +-
>  config/tools/ldap/Makefile                 |    3 +
>  config/tools/ldap/confdb2ldif.c            |   30 +-
>  config/tools/ldap/rng2ldif/Makefile        |   46 +
>  config/tools/ldap/rng2ldif/debug.h         |   10 +
>  config/tools/ldap/rng2ldif/genclass.c      |  102 +
>  config/tools/ldap/rng2ldif/ldaptypes.c     |   53 +
>  config/tools/ldap/rng2ldif/ldaptypes.h     |    8 +
>  config/tools/ldap/rng2ldif/name.c          |   58 +
>  config/tools/ldap/rng2ldif/name.h          |    6 +
>  config/tools/ldap/rng2ldif/rng2ldif.c      |  232 +++
>  config/tools/ldap/rng2ldif/tree.c          |  363 ++++
>  config/tools/ldap/rng2ldif/tree.h          |   40 +
>  config/tools/ldap/rng2ldif/value-list.c    |  192 ++
>  config/tools/ldap/rng2ldif/value-list.h    |   29 +
>  config/tools/ldap/rng2ldif/zalloc.c        |   23 +
>  config/tools/ldap/rng2ldif/zalloc.h        |    6 +
>  config/tools/man/Makefile                  |    4 +-
>  config/tools/man/ccs_config_dump.8         |   35 +
>  config/tools/man/ccs_config_validate.8     |   52 +
>  config/tools/xml/Makefile                  |   40 +-
>  config/tools/xml/ccs_config_dump.c         |  158 ++
>  config/tools/xml/ccs_config_validate.in    |  169 ++
>  config/tools/xml/cluster.rng               | 2806
> ---------------------------
>  config/tools/xml/cluster.rng.in            | 2845
> ++++++++++++++++++++++++++++
>  configure                                  |   16 +-
>  fence/agents/apc/fence_apc.py              |    2 +-
>  fence/agents/drac/Makefile                 |    2 +-
>  fence/agents/drac/fence_drac5.py           |  131 --
>  fence/agents/drac5/Makefile                |    5 +
>  fence/agents/drac5/fence_drac5.py          |  131 ++
>  fence/agents/ipmilan/ipmilan.c             |   85 +-
>  fence/fence_node/Makefile                  |   12 +-
>  fence/fence_tool/Makefile                  |   10 +-
>  fence/fenced/Makefile                      |    9 +-
>  fence/fenced/cpg.c                         |  466 ++++-
>  fence/fenced/fd.h                          |   26 +-
>  fence/fenced/group.c                       |    6 +-
>  fence/fenced/main.c                        |   33 +-
>  fence/fenced/member_cman.c                 |  111 +-
>  fence/fenced/recover.c                     |   88 +-
>  fence/man/fence_ipmilan.8                  |   10 +
>  gfs-kernel/src/gfs/ops_file.c              |    2 +-
>  gfs-kernel/src/gfs/ops_fstype.c            |    4 +-
>  gfs-kernel/src/gfs/ops_inode.c             |    2 +-
>  gfs/gfs_fsck/fsck_incore.h                 |    1 +
>  gfs/gfs_fsck/pass1.c                       |    1 -
>  gfs/gfs_fsck/super.c                       |  252 ++-
>  gfs2/convert/gfs2_convert.c                |    2 +-
>  gfs2/edit/gfs2hex.c                        |   83 +-
>  gfs2/edit/gfs2hex.h                        |    2 +-
>  gfs2/edit/hexedit.c                        |  239 ++-
>  gfs2/edit/hexedit.h                        |    3 +
>  gfs2/edit/savemeta.c                       |   37 +-
>  gfs2/fsck/pass1.c                          |    1 -
>  gfs2/libgfs2/gfs1.c                        |    4 +-
>  gfs2/libgfs2/libgfs2.h                     |    2 +-
>  gfs2/libgfs2/super.c                       |    1 +
>  group/Makefile                             |    6 +-
>  group/dlm_controld/Makefile                |   57 +-
>  group/dlm_controld/config.h                |    2 +-
>  group/dlm_controld/cpg.c                   |  253 +++-
>  group/dlm_controld/dlm_daemon.h            |   14 +-
>  group/dlm_controld/main.c                  |   14 +-
>  group/dlm_controld/member_cman.c           |   10 +-
>  group/dlm_controld/pacemaker.c             |  356 ++--
>  group/dlm_controld/plock.c                 |   37 +-
>  group/gfs_controld/Makefile                |   46 +-
>  group/gfs_controld/cpg-new.c               |  170 ++-
>  group/gfs_controld/gfs_daemon.h            |   18 +-
>  group/gfs_controld/main.c                  |   14 +-
>  group/gfs_controld/member_cman.c           |    6 +-
>  group/gfs_controld/member_pcmk.c           |   84 +
>  group/tool/Makefile                        |    5 +-
>  make/cobj.mk                               |    4 +
>  make/defines.mk.input                      |    3 +-
>  make/install.mk                            |    4 +
>  make/release.mk                            |   81 +-
>  make/uninstall.mk                          |    3 +
>  rgmanager/src/daemons/groups.c             |    1 -
>  rgmanager/src/resources/Makefile           |   24 +
>  rgmanager/src/resources/ra2csv.xsl         |   19 -
>  rgmanager/src/resources/ra2oid.xsl         |   68 -
>  rgmanager/src/resources/ra2ref.xsl         |   10 +
>  rgmanager/src/resources/ra2rng.xsl         |   43 +
>  rgmanager/src/resources/resources.rng.head |    1 +
>  rgmanager/src/resources/resources.rng.mid  |    4 +
>  rgmanager/src/resources/resources.rng.tail |   32 +
>  rgmanager/src/resources/vm.sh              |   14 +
>  94 files changed, 7032 insertions(+), 4071 deletions(-)
> 




From teigland at redhat.com  Fri Sep 25 14:24:51 2009
From: teigland at redhat.com (David Teigland)
Date: Fri, 25 Sep 2009 09:24:51 -0500
Subject: [Linux-cluster] GFS2 and D state HTTPD processes
In-Reply-To: <F547661C26C5364D9FA0C481D395CCCE25A77F5BCC@apps-1.uks.local>
References: <F547661C26C5364D9FA0C481D395CCCE25A77F5BCC@apps-1.uks.local>
Message-ID: <20090925142450.GB9544@redhat.com>

On Thu, Sep 24, 2009 at 11:30:38AM +0100, Gavin Conway wrote:
>         <gfs_controld plock_ownership="1" plock_rate_limit="0"/>

plock_ownership doesn't work right in 5.3, you need to keep it 0 until you
upgrade to 5.4.  (and remember that value can only be changed with the cluster
off line.)

(plock_ownership also only improves plock performance for highly localized
workloads, otherwise it actually harms plock perforamance.)


> Can anyone give me some pointers on what we should be investigating for why
> this is failing? I've had our networks team crawl over the networking and
> that all seems fine. The MTU is set correctly on the MD3000i and on the
> individual nodes. I've also used the ping_pong tool and on a single file on
> the GFS cluster we can get around 90K locks on a file. If I run ping_pong
> against the same file from two nodes that then drops to around 70 locks per
> second. I don't think that's the issue though.

What leads you to believe your performance issues are related to posix locks?
That would be very surprising to me.  You can use strace to measure the time
system calls are taking; if fcntl(SETLK) is on top, then it's worth looking at
plocks.

Dave



From teigland at redhat.com  Fri Sep 25 14:28:02 2009
From: teigland at redhat.com (David Teigland)
Date: Fri, 25 Sep 2009 09:28:02 -0500
Subject: [Linux-cluster] GFS2 and D state HTTPD processes
In-Reply-To: <F547661C26C5364D9FA0C481D395CCCE25AAFB802F@apps-1.uks.local>
References: <F547661C26C5364D9FA0C481D395CCCE25A77F5BCC@apps-1.uks.local>
	<1253868394.2713.5.camel@localhost.localdomain>
	<F547661C26C5364D9FA0C481D395CCCE25AAFB802F@apps-1.uks.local>
Message-ID: <20090925142802.GC9544@redhat.com>

On Fri, Sep 25, 2009 at 12:08:17PM +0100, Gavin Conway wrote:
> and for a webserver, or a group of webservers, with a large amount of files
> comprising the website itself is it worth increasing the drop_resources_time
> value so that file locks are flushed faster?

It's more likely that turning off plock_ownership (0) would be better if you
have enough sharing to make flushing an issue.

Dave



From pradhanparas at gmail.com  Fri Sep 25 17:24:07 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 25 Sep 2009 12:24:07 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<64D0546C5EBBD147B75DE133D798665F03F3EA3F@hugo.eprize.local>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
Message-ID: <8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>

Anyone having issue as mine? Virtual machine service is not being
properly handled by the cluster.


Thanks
Paras.

On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
> Ok.. here is my cluster.conf file
>
> --
> [root at cvtst1 cluster]# more cluster.conf
> <?xml version="1.0"?>
> <cluster alias="test" config_version="9" name="test">
> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> ? ? ? ?<clusternodes>
> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
> ? ? ? ? ? ? ? ?</clusternode>
> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
> ? ? ? ? ? ? ? ?</clusternode>
> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
> ? ? ? ? ? ? ? ?</clusternode>
> ? ? ? ?</clusternodes>
> ? ? ? ?<cman/>
> ? ? ? ?<fencedevices/>
> ? ? ? ?<rm>
> ? ? ? ? ? ? ? ?<failoverdomains>
> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
> ? ? ? ? ? ? ? ?</failoverdomains>
> ? ? ? ? ? ? ? ?<resources/>
> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
> name="guest1" path="/vms" recovery="r
> estart" restart_expire_time="0"/>
> ? ? ? ?</rm>
> </cluster>
> [root at cvtst1 cluster]#
> ------
>
> Thanks!
> Paras.
>
>
> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>> having another problem. When I start the my xen vm in one node, it
>>> also starts on other nodes. Which daemon controls ?this?
>>
>> This is usually done bei clurgmgrd (which is part of the rgmanager
>> package). To me, this sounds like a configuration problem. Maybe,
>> you can post your cluster.conf?
>>
>> Regards,
>> Volker
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>



From kai at fiber.net  Fri Sep 25 18:31:16 2009
From: kai at fiber.net (Kai Meyer)
Date: Fri, 25 Sep 2009 12:31:16 -0600
Subject: [Linux-cluster] GFS2 fatal: invalid metadata block
In-Reply-To: <1253537628.6052.274.camel@localhost.localdomain>
References: <4AB4BD8E.9060905@fiber.net>
	<1253537628.6052.274.camel@localhost.localdomain>
Message-ID: <4ABD0C74.7030705@fiber.net>

Sorry for the slow response. We ended up (finally) purchasing some RHEL 
licenses to try and get some phone support for this problem, and came up 
with a plan to salvage what we could. I'll try to offer a brief history 
of the problem in hope you can help me understand this issue a little 
better.
I've posted the relevant logfile entries to the events described here : 
http://kai.gnukai.com/gfs2_meltdown.txt
All the nodes send syslog to a remote server named pxe, so the combined 
syslog for all the nodes plus the syslog server is here: 
http://kai.gnukai.com/messages.txt
We started with a 4 node cluster (nodes 1, 2, 4, 5). The GFS2 filesystem 
was created with the latest CentOS 5.3 had to offer when it was 
released. Node 3 was off at the time the errors occurred, and not part 
of the cluster.
First issue I can recover from syslog is from node 5 (192.168.100.105) 
on Sep 8 14:11:27 was a 'fatal: invalid metadata block' error that 
resulted in the file system being withdrawn.
Next was node 4 (192.168.100.104) to hit a 'fatal: filesystem 
consistency error' that also resulted in the file system being 
withdrawn. On the systems themselves, any attempt to access the 
filesystem would result in a I/O error response. At the prospect of 
rebooting 2 of the 4 nodes in my cluster, I brought node 3 
(192.168.100.103) online first. Then I power cycled nodes 4 and 5 one at 
a time and let them come back online. These nodes are running Xen, so I 
start to bring the VMs that were on nodes 4 and 5 online on nodes 3-5 
after all 3 had joined the cluster.
Shortly thereafter, node 3 encounters the 'fatal: invalid metadata 
block', and withdraws the file system. Then node 2 (.102) encounters 
'fatal: invalid metadata block' also, and withdraws the filesystem. So I 
reboot them.
During their reboot, nodes 1 (.101) and 5 hits the same 'fatal: invalid 
metadata block' error. I waited for nodes 2 and 3 to come back online to 
preserve the cluster. At this point, node 4 was the only node that still 
had the filesystem mounted. After I had rebooted the other 4 nodes, none 
of them could mount the files system after joining the cluster, and node 
4 was spinning on the error:
Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
fsid=xencluster1:xenclusterfs1.0: jid=4: Trying to acquire journal lock...
Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
fsid=xencluster1:xenclusterfs1.0: jid=4: Busy
It wasn't until this point that we suspected the SAN. We discovered that 
the SAN had marked a drive as "failed" but did not remove it from the 
array and begin to rebuild on the hot spare. When we physically removed 
the failed drive, the hot spare was picked up and put into the array.
The VMs on node 4 were the only ones "running" but they had all crashed 
because their disk was unavailable. I decided to reboot all the nodes to 
try and re-establish the cluster. We were able to get all the VMs turned 
back on, and we thought we were out of the dark, with the exception of 
the high level of filesystem corruption we caused inside 30% of the VM's 
filesystems. We ran them through their ext3 filesystem checks, and got 
them all running again.

Then at the time I send the original email, we were encountering the 
same invalid metadata block errors on the VMs at different points.

With Redhat on the phone, we decided to migrate as much data as we could 
from the original production SAN to a new SAN, and bring the VMs online 
on the new SAN. There were a total of 3 VM disk images that would not 
copy because they would trigger the invalid metadata block error every 
time. After the migration, we tried 3 filesystem checks, all of which 
failed, leaving the fsck_dlm mechanism configured on the filesystem. We 
were able to override the lock with the instructions here:
http://kbase.redhat.com/faq/docs/DOC-17402

We were able to remount the gfs2 filesystem again, but with out any 
improvement. If we tried to copy those same files, it would withdraw the 
filesystem. We were able to mount the disk images and recover some 
files, and we feel like we got lucky that none of those files also 
triggered the filesystem withdraw.

Today, we feel fairly confident that the underlying issue began with the 
disk being marked as failed, but not being removed from the array. We've 
contacted the hardware vendor of the SAN, but the only response they 
offered was, "That shouldn't have happened, and it shouldn't happen again."


I am very interested in your response, but at this point there isn't any 
urgency. The old production SAN is in the lab, and we are running smart 
tests on each of the disks to see if any of the disks are salvageable. 
The new SAN we put into production has all new drives (different make an 
model), and we hope we don't encounter any further issues.

The last issue I want to investigate is phasing out all my CentOS 5.3 
servers, and installing RHEL 5.4 servers one at a time, so I don't have 
to take down the cluster. The intent is to live migrate all the VMs to 
the RHEL 5.4 servers to keep the availability on the VMs as high as 
possible, but that's another topic I'll probably post about later after 
some testing in the lab.

-Kai Meyer

Steven Whitehouse wrote:
> Hi,
>
> On Sat, 2009-09-19 at 05:16 -0600, Kai Meyer wrote:
>   
>> I have a 5 node cluster running kernel 2.6.18-128.1.6.el5xen and 
>> gfs2-utils-0.1.53-1.el5_3.3 . Twice in 10 days, each node in my cluster 
>> has failed with the same message in /var/log/messages. dmesg reports the 
>> same errors, and on some nodes there are no other entries previous to 
>> the invalid metadata block error.
>>
>> I would like to know what issues can trigger such an event. If it is 
>> more helpful for me to provide more information, I will be happy to, I'm 
>> just not sure what other information you would consider relevant.
>>
>> Thank you for your time,
>> -Kai Meyer
>>
>>     
> It means that the kernel was looking for an indirect block, but instead
> found something that was not an indirect block. The only way to fix this
> is with fsck (after unmounting on all nodes) otherwise the issue is
> likely to continue to occur each time you access the particular inode
> with the problem.
>
> There have been a couple of reports of this (or very similar) issues
> recently. The problem in each case is that the original issue probably
> happened some time before it triggered the message which you've seen.
> That means that it is very tricky to figure out exactly what the cause
> is.
>
> I'd be very interested to know whether this filesystem was a newly
> created gfs2 filesystem or an upgraded gfs1 filesystem. Also, whether
> there have been any other issues, however minor, which might have caused
> a node to be rebooted or fenced since the filesystem was created? Also,
> any other background information about the type of workload that was
> being run on the filesystem would be helpful too.
>
> Steve.
>
>
>   
>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>> fsid=xencluster1:xenclusterfs1.1: fatal: invalid metadata block
>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>> fsid=xencluster1:xenclusterfs1.1:   bh = 567447963 (magic number)
>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>> fsid=xencluster1:xenclusterfs1.1:   function = 
>> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line
>> = 334
>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>> fsid=xencluster1:xenclusterfs1.1: about to withdraw this file system
>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>> fsid=xencluster1:xenclusterfs1.1: telling LM to withdraw
>> Sep 19 02:02:07 192.168.100.104 kernel: GFS2: 
>> fsid=xencluster1:xenclusterfs1.1: withdrawn
>> Sep 19 02:02:07 192.168.100.104 kernel: 
>> Sep 19 02:02:07 192.168.100.104 kernel: Call Trace:
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885154ce>] 
>> :gfs2:gfs2_lm_withdraw+0xc1/0xd0
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262907>] 
>> __wait_on_bit+0x60/0x6e
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80215788>] 
>> sync_buffer+0x0/0x3f
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262981>] 
>> out_of_line_wait_on_bit+0x6c/0x78
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8029a01a>] 
>> wake_bit_function+0x0/0x23
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a7f1>] 
>> submit_bh+0x10a/0x111
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885284a7>] 
>> :gfs2:gfs2_meta_check_ii+0x2c/0x38
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88518d30>] 
>> :gfs2:gfs2_meta_indirect_buffer+0x104/0x160
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88509fc3>] 
>> :gfs2:gfs2_block_map+0x1dc/0x33e
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a821>] 
>> poll_freewait+0x29/0x6a
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a199>] 
>> :gfs2:gfs2_extent_map+0x74/0xac
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a2ce>] 
>> :gfs2:gfs2_write_alloc_required+0xfd/0x122
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885128d5>] 
>> :gfs2:gfs2_glock_nq+0x248/0x273
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851a27c>] 
>> :gfs2:gfs2_write_begin+0x99/0x36a
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851bd1b>] 
>> :gfs2:gfs2_file_buffered_write+0x14b/0x2e5
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8020d3a5>] 
>> file_read_actor+0x0/0xfc
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c151>] 
>> :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c2f4>] 
>> :gfs2:gfs2_file_write_nolock+0xaa/0x10f
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022eca0>] 
>> __wake_up+0x38/0x4f
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80299fec>] 
>> autoremove_wake_function+0x0/0x2e
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022fbe4>] 
>> pipe_readv+0x38e/0x3a2
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80263bce>] 
>> lock_kernel+0x1b/0x32
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c444>] 
>> :gfs2:gfs2_file_write+0x49/0xa7
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80216da9>] 
>> vfs_write+0xce/0x174
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff802175e1>] 
>> sys_write+0x45/0x6e
>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8025f2f9>] 
>> tracesys+0xab/0xb6
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   




From allen at isye.gatech.edu  Fri Sep 25 19:18:20 2009
From: allen at isye.gatech.edu (Allen Belletti)
Date: Fri, 25 Sep 2009 15:18:20 -0400
Subject: [Linux-cluster] GFS2: Huge number of glocks
In-Reply-To: <1253781999.2760.8.camel@localhost.localdomain>
References: <c8c8ded30907100742y21661bd3s977e61c996fe8a11@mail.gmail.com>	<1247239627.3384.38.camel@localhost.localdomain>	<c8c8ded30907100849m3f541939w1a2768fab95faf5a@mail.gmail.com>	<1247241416.3384.42.camel@localhost.localdomain>	<420241f50909240104w6d8e15a8q9053d66024d88678@mail.gmail.com>
	<1253781999.2760.8.camel@localhost.localdomain>
Message-ID: <4ABD177C.6030705@isye.gatech.edu>

Hi All,

I had posted about this once before and didn't get a response.  Was
really hoping that Steven or another person who's involved might be able
to comment at least briefly.  It would really help to know if this is
"normal" or not.

Thanks,
Allen

-------------

I've been running GFS and now GFS2 for several years on a two-node mail
cluster, generally with good results, especially once GFS2 became
production ready and we upgraded.  However from time to time (ranging
from a few days to a month), we'll get a "stuck" lock on one particular
file or another which them blocks a user from their mail.  While looking
into this, I've recently become aware of a VERY large number of glocks
being left behind after our nightly rsync backups.  I'm checking on the
lock situation with "gfs2_tool lockdump /home" and counting locks by
piping through "grep ^G | wc -l".  We have two GFS2 filesystems
mounted.  On one of them, the number of glocks returns to "normal" after
the backup (currently showing about 5400.)  On the other, it stays very
high although it will drop somewhat throughout the day.  Currently I am
seeing over 500,000.  Given the ten minutes or so that it takes to list
them, this seems like it can't be great for performance.

Most of the locks look like this:

G:  s:SH n:5/b25806 f: t:SH d:EX/0 l:0 a:0 r:3
H: s:SH f:EH e:0 p:31042 [(ended)] gfs2_inode_lookup+0x114/0x1f0 [gfs2]

Note that the pid (31042 in this case) corresponds to one of the
completed rsync processes which generated the locks in the first place.

My questions are 1) Is this a bad thing?  My gut feeling is "yes" but
perhaps the system is highly efficient in dealing with these locks, and
2) Can anything be done about it?  The tuning opportunities in GFS2 are
very limited compared to GFS, and the few things I've tried seem to have
no effect.

By the way, I am running with plock_ownership="1" and
plock_rate_limit="0" in cluster.conf.

Thanks in advance,
Allen

-- 
Allen Belletti
allen at isye.gatech.edu                             404-894-6221 Phone
Industrial and Systems Engineering                404-385-2988 Fax
Georgia Institute of Technology



From brem.belguebli at gmail.com  Fri Sep 25 21:22:35 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Fri, 25 Sep 2009 23:22:35 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909161345h6170095dpb65f1d3af4920a0d@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
Message-ID: <29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>

It looks like no.

can you send an output of clustat  of when the VM is running on
multiple nodes at the same time?

And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?



2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
> Anyone having issue as mine? Virtual machine service is not being
> properly handled by the cluster.
>
>
> Thanks
> Paras.
>
> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>> Ok.. here is my cluster.conf file
>>
>> --
>> [root at cvtst1 cluster]# more cluster.conf
>> <?xml version="1.0"?>
>> <cluster alias="test" config_version="9" name="test">
>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>> ? ? ? ?<clusternodes>
>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>> ? ? ? ? ? ? ? ?</clusternode>
>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>> ? ? ? ? ? ? ? ?</clusternode>
>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>> ? ? ? ? ? ? ? ?</clusternode>
>> ? ? ? ?</clusternodes>
>> ? ? ? ?<cman/>
>> ? ? ? ?<fencedevices/>
>> ? ? ? ?<rm>
>> ? ? ? ? ? ? ? ?<failoverdomains>
>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>> ? ? ? ? ? ? ? ?</failoverdomains>
>> ? ? ? ? ? ? ? ?<resources/>
>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>> name="guest1" path="/vms" recovery="r
>> estart" restart_expire_time="0"/>
>> ? ? ? ?</rm>
>> </cluster>
>> [root at cvtst1 cluster]#
>> ------
>>
>> Thanks!
>> Paras.
>>
>>
>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>> having another problem. When I start the my xen vm in one node, it
>>>> also starts on other nodes. Which daemon controls ?this?
>>>
>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>> package). To me, this sounds like a configuration problem. Maybe,
>>> you can post your cluster.conf?
>>>
>>> Regards,
>>> Volker
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Fri Sep 25 21:55:56 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 25 Sep 2009 16:55:56 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<20090916213005.GA3692@dijkstra>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
Message-ID: <8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>

Ok. Please see below. my vm is running on all nodes though clustat
says it is stopped.

--
[root at cvtst1 ~]# clustat
Cluster Status for test @ Fri Sep 25 16:52:34 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 cvtst2                                                    1 Online, rgmanager
 cvtst1                                                     2 Online,
Local, rgmanager
 cvtst3                                                     3 Online, rgmanager

 Service Name                                            Owner (Last)
                                          State
 ------- ----                                            ----- ------
                                          -----
 vm:guest1                                               (none)
                                          stopped
[root at cvtst1 ~]#


---
o/p of xm li on cvtst1

--
[root at cvtst1 ~]# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     3470     2 r-----  28939.4
guest1                                     7      511     1 -b----   7727.8

o/p of xm li on cvtst2

--
[root at cvtst2 ~]# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     3470     2 r-----  31558.9
guest1                                    21      511     1 -b----   7558.2
---

Thanks
Paras.



On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> It looks like no.
>
> can you send an output of clustat ?of when the VM is running on
> multiple nodes at the same time?
>
> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>
>
>
> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>> Anyone having issue as mine? Virtual machine service is not being
>> properly handled by the cluster.
>>
>>
>> Thanks
>> Paras.
>>
>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>> Ok.. here is my cluster.conf file
>>>
>>> --
>>> [root at cvtst1 cluster]# more cluster.conf
>>> <?xml version="1.0"?>
>>> <cluster alias="test" config_version="9" name="test">
>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>> ? ? ? ?<clusternodes>
>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>> ? ? ? ? ? ? ? ?</clusternode>
>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>> ? ? ? ? ? ? ? ?</clusternode>
>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>> ? ? ? ? ? ? ? ?</clusternode>
>>> ? ? ? ?</clusternodes>
>>> ? ? ? ?<cman/>
>>> ? ? ? ?<fencedevices/>
>>> ? ? ? ?<rm>
>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>> ? ? ? ? ? ? ? ?<resources/>
>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>> name="guest1" path="/vms" recovery="r
>>> estart" restart_expire_time="0"/>
>>> ? ? ? ?</rm>
>>> </cluster>
>>> [root at cvtst1 cluster]#
>>> ------
>>>
>>> Thanks!
>>> Paras.
>>>
>>>
>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>> having another problem. When I start the my xen vm in one node, it
>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>
>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>> you can post your cluster.conf?
>>>>
>>>> Regards,
>>>> Volker
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Fri Sep 25 22:07:42 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Sat, 26 Sep 2009 00:07:42 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909161433u79d9afdfted985168eb507599@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
Message-ID: <29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>

Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
using xm commands out of cluster control ?(or maybe a thru an
automatic init script ?)

When clustered, you should never be starting services (manually or
thru automatic init script) out of cluster control

The thing would be to stop your vm on all the nodes with the adequate
xm command (not using xen myself) and try to start it with clusvcadm.

Then see if it is started on all nodes (send clustat output)



2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
> Ok. Please see below. my vm is running on all nodes though clustat
> says it is stopped.
>
> --
> [root at cvtst1 ~]# clustat
> Cluster Status for test @ Fri Sep 25 16:52:34 2009
> Member Status: Quorate
>
> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
> Local, rgmanager
> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>
> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
> [root at cvtst1 ~]#
>
>
> ---
> o/p of xm li on cvtst1
>
> --
> [root at cvtst1 ~]# xm li
> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>
> o/p of xm li on cvtst2
>
> --
> [root at cvtst2 ~]# xm li
> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
> ---
>
> Thanks
> Paras.
>
>
>
> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> It looks like no.
>>
>> can you send an output of clustat ?of when the VM is running on
>> multiple nodes at the same time?
>>
>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>
>>
>>
>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>> Anyone having issue as mine? Virtual machine service is not being
>>> properly handled by the cluster.
>>>
>>>
>>> Thanks
>>> Paras.
>>>
>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>> Ok.. here is my cluster.conf file
>>>>
>>>> --
>>>> [root at cvtst1 cluster]# more cluster.conf
>>>> <?xml version="1.0"?>
>>>> <cluster alias="test" config_version="9" name="test">
>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>> ? ? ? ?<clusternodes>
>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> ? ? ? ?</clusternodes>
>>>> ? ? ? ?<cman/>
>>>> ? ? ? ?<fencedevices/>
>>>> ? ? ? ?<rm>
>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>> ? ? ? ? ? ? ? ?<resources/>
>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>> name="guest1" path="/vms" recovery="r
>>>> estart" restart_expire_time="0"/>
>>>> ? ? ? ?</rm>
>>>> </cluster>
>>>> [root at cvtst1 cluster]#
>>>> ------
>>>>
>>>> Thanks!
>>>> Paras.
>>>>
>>>>
>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>
>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>> you can post your cluster.conf?
>>>>>
>>>>> Regards,
>>>>> Volker
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Fri Sep 25 22:24:01 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 25 Sep 2009 17:24:01 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<20090917052134.GA6003@dijkstra>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
Message-ID: <8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>

No I am not manually starting not using automatic init scripts.

I started the vm using: clusvcadm -e vm:guest1

I have just stopped using clusvcadm -s vm:guest1. For few seconds it
says guest1 started . But after a while I can see the guest1 on all
three nodes.

clustat says:

 Service Name                                            Owner (Last)
                                          State
 ------- ----                                            ----- ------
                                          -----
 vm:guest1                                               (none)
                                          stopped

But I can see the vm from xm li.

This is what I can see from the log:


Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
returned 1 (generic error)
Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
vm:guest1; return value: 1
Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
recovering
Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
service vm:guest1
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
returned 1 (generic error)
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
vm:guest1; return value: 1
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
recovering


Paras.

On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
> using xm commands out of cluster control ?(or maybe a thru an
> automatic init script ?)
>
> When clustered, you should never be starting services (manually or
> thru automatic init script) out of cluster control
>
> The thing would be to stop your vm on all the nodes with the adequate
> xm command (not using xen myself) and try to start it with clusvcadm.
>
> Then see if it is started on all nodes (send clustat output)
>
>
>
> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>> Ok. Please see below. my vm is running on all nodes though clustat
>> says it is stopped.
>>
>> --
>> [root at cvtst1 ~]# clustat
>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>> Member Status: Quorate
>>
>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>> Local, rgmanager
>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>
>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>> [root at cvtst1 ~]#
>>
>>
>> ---
>> o/p of xm li on cvtst1
>>
>> --
>> [root at cvtst1 ~]# xm li
>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>
>> o/p of xm li on cvtst2
>>
>> --
>> [root at cvtst2 ~]# xm li
>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>> ---
>>
>> Thanks
>> Paras.
>>
>>
>>
>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> It looks like no.
>>>
>>> can you send an output of clustat ?of when the VM is running on
>>> multiple nodes at the same time?
>>>
>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>
>>>
>>>
>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>> Anyone having issue as mine? Virtual machine service is not being
>>>> properly handled by the cluster.
>>>>
>>>>
>>>> Thanks
>>>> Paras.
>>>>
>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>> Ok.. here is my cluster.conf file
>>>>>
>>>>> --
>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>> <?xml version="1.0"?>
>>>>> <cluster alias="test" config_version="9" name="test">
>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>> ? ? ? ?<clusternodes>
>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>> ? ? ? ?</clusternodes>
>>>>> ? ? ? ?<cman/>
>>>>> ? ? ? ?<fencedevices/>
>>>>> ? ? ? ?<rm>
>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>> name="guest1" path="/vms" recovery="r
>>>>> estart" restart_expire_time="0"/>
>>>>> ? ? ? ?</rm>
>>>>> </cluster>
>>>>> [root at cvtst1 cluster]#
>>>>> ------
>>>>>
>>>>> Thanks!
>>>>> Paras.
>>>>>
>>>>>
>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>
>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>> you can post your cluster.conf?
>>>>>>
>>>>>> Regards,
>>>>>> Volker
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Fri Sep 25 22:52:49 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Fri, 25 Sep 2009 17:52:49 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
Message-ID: <8b711df40909251552k5798610fp17030eb2f6a59827@mail.gmail.com>

It seems to me that I am having issue related to rgmanager.

Here is what I did:

shutdown vm on all nodes using clusvcamd -s vm:guest1

Now if I restart rgmanager in only one any node. It restarts and but
also starts the same vm in all nodes. Also I have disabled the
autostart to 0 in my cluster.conf file just to debug.


Thanks
Paras.

On Fri, Sep 25, 2009 at 5:24 PM, Paras pradhan <pradhanparas at gmail.com> wrote:
> No I am not manually starting not using automatic init scripts.
>
> I started the vm using: clusvcadm -e vm:guest1
>
> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
> says guest1 started . But after a while I can see the guest1 on all
> three nodes.
>
> clustat says:
>
> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>
> But I can see the vm from xm li.
>
> This is what I can see from the log:
>
>
> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
> returned 1 (generic error)
> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
> vm:guest1; return value: 1
> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
> recovering
> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
> service vm:guest1
> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
> returned 1 (generic error)
> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
> vm:guest1; return value: 1
> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
> recovering
>
>
> Paras.
>
> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>> using xm commands out of cluster control ?(or maybe a thru an
>> automatic init script ?)
>>
>> When clustered, you should never be starting services (manually or
>> thru automatic init script) out of cluster control
>>
>> The thing would be to stop your vm on all the nodes with the adequate
>> xm command (not using xen myself) and try to start it with clusvcadm.
>>
>> Then see if it is started on all nodes (send clustat output)
>>
>>
>>
>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>> Ok. Please see below. my vm is running on all nodes though clustat
>>> says it is stopped.
>>>
>>> --
>>> [root at cvtst1 ~]# clustat
>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>> Member Status: Quorate
>>>
>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>> Local, rgmanager
>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>
>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>> [root at cvtst1 ~]#
>>>
>>>
>>> ---
>>> o/p of xm li on cvtst1
>>>
>>> --
>>> [root at cvtst1 ~]# xm li
>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>
>>> o/p of xm li on cvtst2
>>>
>>> --
>>> [root at cvtst2 ~]# xm li
>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>> ---
>>>
>>> Thanks
>>> Paras.
>>>
>>>
>>>
>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>>> It looks like no.
>>>>
>>>> can you send an output of clustat ?of when the VM is running on
>>>> multiple nodes at the same time?
>>>>
>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>
>>>>
>>>>
>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>> properly handled by the cluster.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Paras.
>>>>>
>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>> Ok.. here is my cluster.conf file
>>>>>>
>>>>>> --
>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>> <?xml version="1.0"?>
>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>> ? ? ? ?<clusternodes>
>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> ? ? ? ?</clusternodes>
>>>>>> ? ? ? ?<cman/>
>>>>>> ? ? ? ?<fencedevices/>
>>>>>> ? ? ? ?<rm>
>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>> name="guest1" path="/vms" recovery="r
>>>>>> estart" restart_expire_time="0"/>
>>>>>> ? ? ? ?</rm>
>>>>>> </cluster>
>>>>>> [root at cvtst1 cluster]#
>>>>>> ------
>>>>>>
>>>>>> Thanks!
>>>>>> Paras.
>>>>>>
>>>>>>
>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>
>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>> you can post your cluster.conf?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Volker
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>



From brem.belguebli at gmail.com  Fri Sep 25 22:53:29 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Sat, 26 Sep 2009 00:53:29 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<1253289144.4021.0.camel@dell-jr.intern.win-rar.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
Message-ID: <29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>

There's a problem with the script that is called by rgmanager to start
the VM, I don't know what causes it

May be you should try something like :

1) stop the VM on all nodes with xm commands
2) edit the /usr/share/cluster/vm.sh script and add the following
lines (after the #!/bin/bash ):
   exec >/tmp/DEBUG 2>&1
   set -x
3) start the VM with clusvcadm -e vm:guest1

It should fail as it did before.

edit the the /tmp/DEBUG file and you will be able to see where it
fails (it may generate a lot of debug)

4) remove the debug lines from /usr/share/cluster/vm.sh

Post the DEBUG file if you're not able to see where it fails.

Brem

2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
> No I am not manually starting not using automatic init scripts.
>
> I started the vm using: clusvcadm -e vm:guest1
>
> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
> says guest1 started . But after a while I can see the guest1 on all
> three nodes.
>
> clustat says:
>
> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>
> But I can see the vm from xm li.
>
> This is what I can see from the log:
>
>
> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
> returned 1 (generic error)
> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
> vm:guest1; return value: 1
> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
> recovering
> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
> service vm:guest1
> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
> returned 1 (generic error)
> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
> vm:guest1; return value: 1
> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
> recovering
>
>
> Paras.
>
> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>> using xm commands out of cluster control ?(or maybe a thru an
>> automatic init script ?)
>>
>> When clustered, you should never be starting services (manually or
>> thru automatic init script) out of cluster control
>>
>> The thing would be to stop your vm on all the nodes with the adequate
>> xm command (not using xen myself) and try to start it with clusvcadm.
>>
>> Then see if it is started on all nodes (send clustat output)
>>
>>
>>
>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>> Ok. Please see below. my vm is running on all nodes though clustat
>>> says it is stopped.
>>>
>>> --
>>> [root at cvtst1 ~]# clustat
>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>> Member Status: Quorate
>>>
>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>> Local, rgmanager
>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>
>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>> [root at cvtst1 ~]#
>>>
>>>
>>> ---
>>> o/p of xm li on cvtst1
>>>
>>> --
>>> [root at cvtst1 ~]# xm li
>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>
>>> o/p of xm li on cvtst2
>>>
>>> --
>>> [root at cvtst2 ~]# xm li
>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>> ---
>>>
>>> Thanks
>>> Paras.
>>>
>>>
>>>
>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>>> It looks like no.
>>>>
>>>> can you send an output of clustat ?of when the VM is running on
>>>> multiple nodes at the same time?
>>>>
>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>
>>>>
>>>>
>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>> properly handled by the cluster.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Paras.
>>>>>
>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>> Ok.. here is my cluster.conf file
>>>>>>
>>>>>> --
>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>> <?xml version="1.0"?>
>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>> ? ? ? ?<clusternodes>
>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> ? ? ? ?</clusternodes>
>>>>>> ? ? ? ?<cman/>
>>>>>> ? ? ? ?<fencedevices/>
>>>>>> ? ? ? ?<rm>
>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>> name="guest1" path="/vms" recovery="r
>>>>>> estart" restart_expire_time="0"/>
>>>>>> ? ? ? ?</rm>
>>>>>> </cluster>
>>>>>> [root at cvtst1 cluster]#
>>>>>> ------
>>>>>>
>>>>>> Thanks!
>>>>>> Paras.
>>>>>>
>>>>>>
>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>
>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>> you can post your cluster.conf?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Volker
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From gigi.mathew-1 at nasa.gov  Sat Sep 26 00:09:19 2009
From: gigi.mathew-1 at nasa.gov (Mathew, Gigi (JSC-EG)[Jacobs Technology])
Date: Fri, 25 Sep 2009 19:09:19 -0500
Subject: [Linux-cluster] GFS Volume Over NFS
Message-ID: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>

Hi:

I am configuring a Dell EqualLogic iSCSI array in a RHEL 5.3 platform with two GFS2 volumes with RedHat Cluster Suite using DRAC (primary) and APC (secondary) fencing devices.  In my cluster configuration I have DELL Servers (PE R900, PE6800, PE2800, PE1950) Blades (PE M1000, PE1955, PE1855).  Now I have few workstations (Precision 7500, Precision 5500,  Precision 690) which I would like to share the above GFS2 volume over NFS (why?  They are not in the same location and currently I use NIS/YPBIND/AUTOFS).  Why should I share it? (Because I have user home directories on this GFS2 volume, which has to seen from these workstations as they login).  What are the options I have? How do I achieve it?  I had seen how to share an NFS over GFS, but NOT the other way. Any suggestion will be greatly appreciated.

Thanks

Gigi Mathew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090925/1cbaf3b9/attachment.htm>

From corey.kovacs at gmail.com  Sat Sep 26 13:54:18 2009
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Sat, 26 Sep 2009 14:54:18 +0100
Subject: [Linux-cluster] GFS2 vs EXT3+HA-LVM
Message-ID: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>

I have been trying for a very long time now to put together a cluster
(RHEL5.2/3/4 + GFS/GFS2) that would match the performance of our current
Tru64 cluster using advfs with limited success. I've tried every tuning
technique I can think of and received help from numerous people as well as
this list. For all that I am thankful.

However, I've come to the realization that at this time it's simply not
reasonable to expect the same performance from GFS2 as my current setup. I
have even been told from RedHat support that our use case (general purpose
file server with applications and home dirs along with some data thrown in
isn't really a good fit for using GFS2. Not sure I agree with that or even
understand the motivation behind the comment but I am passed that now. The
simple truth is for our purposes, I think he may be right and I am now
exploring the idea of using ha-lvm with ext3. I'd not set ip up before this
past week and I have to say it looks promising. Rsyncs, directories with
lots of files etc. all behave as I expect them to, fast.

That said, how many people are actually using the ha-lvm stuff in
production? I don't know anyone using it and have heard little if anything
on this list about it. I am hoping that's because it 'just works' and people
leave it at that.

Has anyone got any practical experience using this method?  I went together
fairly straightforward once I got past the lvm tagging, but I'd thought I'd
ask folks what pitfalls there are if any. Only one I can think of is someone
not in the know forgetting about the tagging involved and having a tough
time managing lvm etc.


Any thoughts are appreciated


Regards


Corey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090926/ed10fd19/attachment.htm>

From ltomsik at gmail.com  Sat Sep 26 16:29:54 2009
From: ltomsik at gmail.com (Libor Tomsik)
Date: Sat, 26 Sep 2009 18:29:54 +0200
Subject: [Linux-cluster] GFS2 directory hangs on one node CentOS 5.3
Message-ID: <a0da40e00909260929i50032d3xa99010e99f292a88@mail.gmail.com>

Hi all,

I'm having a strange issue with a two nodes cluster based on xen
virtual hosts with shared disk on clvm. The servers are running apache
and one is considered as hot backup. On that node awstats are counted
from the apache custom logs stored on the shared device. Web data,
logs, configs and awstats results are in different directories withing
the same GFS2 volume.

Everything works fine, but sometimes (at production environment, damn)
the directory with logs get frozen for the spare node with awstats.
All commands like ls, cd, mc on that directory get status D.?On the
second node all works fine. Other directories seems unaffected too.

I can not umount fs neither remout it ro and back rw since there are
"running" processes at D state.

Can someone give me some advice, how-to prevent this problem? And
how-to recovery from it? It is a production with SLA on :(  In next
time, I'll try to make lockdump on both nodes.

Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2,
kmod-gfs2-xen-1.92-1.1.el5_2.2

Regards

Libor



From brem.belguebli at gmail.com  Sat Sep 26 18:17:32 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Sat, 26 Sep 2009 20:17:32 +0200
Subject: [Linux-cluster] GFS2 vs EXT3+HA-LVM
In-Reply-To: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>
References: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>
Message-ID: <29ae894c0909261117w5e44eaf3ya00fd17568872d02@mail.gmail.com>

Hi,

Before starting arguing, I just want to inform you that this is my own
personnal opinion on the subject.

This said, I have been using HA-LVM in production with another vendor
cluster stack (HP not to name it) as I didn't have any other
alternative.

HA-LVM can be a solution in this case.

Though Redhat recommends the HA-LVM when clusterising in
Active/passive mode (at least untill RHCS 5.2) with unclustered FS (ie
ext3) , I  do consider it the weakest option (some security concerns)
as CLVM  is supposed   to offer a more secured waay of doing things
(supposed... I'll come to this later).

The main problem I see with HA-LVM (aka hosttags) is that in a
cluster, any person with root privileges (an unaware Unix admin)
could, from any cluster node, delete the tags of a service hosted by
another node and replace them by the local node tags and then activate
the concerned VG's.

This would lead easily to the possibility of mounting the FS on the
wrong node though already mounted on the other node.

CLVM (cluster LVM) should not allow this when exclusive activation is
used. The reality being different, the exclusive activation can be
bypassed (cf https://bugzilla.redhat.com/show_bug.cgi?id=517900).

Surprising answer from Redhat support about GFS2 performance

Brem







2009/9/26, Corey Kovacs <corey.kovacs at gmail.com>:
> I have been trying for a very long time now to put together a cluster
> (RHEL5.2/3/4 + GFS/GFS2) that would match the performance of our current
> Tru64 cluster using advfs with limited success. I've tried every tuning
> technique I can think of and received help from numerous people as well as
> this list. For all that I am thankful.
>
> However, I've come to the realization that at this time it's simply not
> reasonable to expect the same performance from GFS2 as my current setup. I
> have even been told from RedHat support that our use case (general purpose
> file server with applications and home dirs along with some data thrown in
> isn't really a good fit for using GFS2. Not sure I agree with that or even
> understand the motivation behind the comment but I am passed that now. The
> simple truth is for our purposes, I think he may be right and I am now
> exploring the idea of using ha-lvm with ext3. I'd not set ip up before this
> past week and I have to say it looks promising. Rsyncs, directories with
> lots of files etc. all behave as I expect them to, fast.
>
> That said, how many people are actually using the ha-lvm stuff in
> production? I don't know anyone using it and have heard little if anything
> on this list about it. I am hoping that's because it 'just works' and people
> leave it at that.
>
> Has anyone got any practical experience using this method?  I went together
> fairly straightforward once I got past the lvm tagging, but I'd thought I'd
> ask folks what pitfalls there are if any. Only one I can think of is someone
> not in the know forgetting about the tagging involved and having a tough
> time managing lvm etc.
>
>
> Any thoughts are appreciated
>
>
> Regards
>
>
> Corey
>



From corey.kovacs at gmail.com  Sat Sep 26 20:45:21 2009
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Sat, 26 Sep 2009 21:45:21 +0100
Subject: [Linux-cluster] GFS2 vs EXT3+HA-LVM
In-Reply-To: <29ae894c0909261117w5e44eaf3ya00fd17568872d02@mail.gmail.com>
References: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>
	<29ae894c0909261117w5e44eaf3ya00fd17568872d02@mail.gmail.com>
Message-ID: <7d6e8da40909261345x64b6d80fl1fb445f3bc45f384@mail.gmail.com>

Brem,

Thanks for the response. I do recognize the possible issues with the
tagging, that's why I've not pulled the trigger on this yet. I want to find
out what others have done.

One of the problems I have with GFS1/2 is the time it takes to traverse the
file system. I ran an rsync of a gfs2 file system to an ext3 on the same
machine and it took several hours to even build the initial  list for the
rsync for 2.5 million files. I realize that's a lot of files but when I can
watch the count at approximately 100 p/sec it's a bit ridiculous.

My config is as follows.

5 nodes running RHEL5.4 64 bit
8 GB ram each.
4 Gb FC to an EVA8100 (48 spindles in the disk group)
5 750GB GFS2 file systems with 1GB resource groups.

One suggestion from RedHat was to turn off colors for "ls" and such. I've
also turned off quota's, atime, diratime etc.

Again, I've not pulled the trigger on this because I want to see what others
have done to improve the performance of GFS2. I'd really rather take that
route, but I need better performance than I've been able to get so far

Your comments are appreciated.



Regards


Corey.

On Sat, Sep 26, 2009 at 7:17 PM, brem belguebli <brem.belguebli at gmail.com>wrote:

> Hi,
>
> Before starting arguing, I just want to inform you that this is my own
> personnal opinion on the subject.
>
> This said, I have been using HA-LVM in production with another vendor
> cluster stack (HP not to name it) as I didn't have any other
> alternative.
>
> HA-LVM can be a solution in this case.
>
> Though Redhat recommends the HA-LVM when clusterising in
> Active/passive mode (at least untill RHCS 5.2) with unclustered FS (ie
> ext3) , I  do consider it the weakest option (some security concerns)
> as CLVM  is supposed   to offer a more secured waay of doing things
> (supposed... I'll come to this later).
>
> The main problem I see with HA-LVM (aka hosttags) is that in a
> cluster, any person with root privileges (an unaware Unix admin)
> could, from any cluster node, delete the tags of a service hosted by
> another node and replace them by the local node tags and then activate
> the concerned VG's.
>
> This would lead easily to the possibility of mounting the FS on the
> wrong node though already mounted on the other node.
>
> CLVM (cluster LVM) should not allow this when exclusive activation is
> used. The reality being different, the exclusive activation can be
> bypassed (cf https://bugzilla.redhat.com/show_bug.cgi?id=517900).
>
> Surprising answer from Redhat support about GFS2 performance
>
> Brem
>
>
>
>
>
>
>
> 2009/9/26, Corey Kovacs <corey.kovacs at gmail.com>:
> > I have been trying for a very long time now to put together a cluster
> > (RHEL5.2/3/4 + GFS/GFS2) that would match the performance of our current
> > Tru64 cluster using advfs with limited success. I've tried every tuning
> > technique I can think of and received help from numerous people as well
> as
> > this list. For all that I am thankful.
> >
> > However, I've come to the realization that at this time it's simply not
> > reasonable to expect the same performance from GFS2 as my current setup.
> I
> > have even been told from RedHat support that our use case (general
> purpose
> > file server with applications and home dirs along with some data thrown
> in
> > isn't really a good fit for using GFS2. Not sure I agree with that or
> even
> > understand the motivation behind the comment but I am passed that now.
> The
> > simple truth is for our purposes, I think he may be right and I am now
> > exploring the idea of using ha-lvm with ext3. I'd not set ip up before
> this
> > past week and I have to say it looks promising. Rsyncs, directories with
> > lots of files etc. all behave as I expect them to, fast.
> >
> > That said, how many people are actually using the ha-lvm stuff in
> > production? I don't know anyone using it and have heard little if
> anything
> > on this list about it. I am hoping that's because it 'just works' and
> people
> > leave it at that.
> >
> > Has anyone got any practical experience using this method?  I went
> together
> > fairly straightforward once I got past the lvm tagging, but I'd thought
> I'd
> > ask folks what pitfalls there are if any. Only one I can think of is
> someone
> > not in the know forgetting about the tagging involved and having a tough
> > time managing lvm etc.
> >
> >
> > Any thoughts are appreciated
> >
> >
> > Regards
> >
> >
> > Corey
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090926/9a3fd186/attachment.htm>

From jakov.sosic at srce.hr  Sat Sep 26 23:10:33 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Sun, 27 Sep 2009 01:10:33 +0200
Subject: [Linux-cluster] GFS2 vs EXT3+HA-LVM
In-Reply-To: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>
References: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>
Message-ID: <20090927011033.1690d830@nb-jsosic>

On Sat, 26 Sep 2009 14:54:18 +0100
Corey Kovacs <corey.kovacs at gmail.com> wrote:

> However, I've come to the realization that at this time it's simply
> not reasonable to expect the same performance from GFS2 as my current
> setup. I have even been told from RedHat support that our use case
> (general purpose file server with applications and home dirs along
> with some data thrown in isn't really a good fit for using GFS2. Not
> sure I agree with that or even understand the motivation behind the
> comment but I am passed that now. The simple truth is for our
> purposes, I think he may be right and I am now exploring the idea of
> using ha-lvm with ext3. I'd not set ip up before this past week and I
> have to say it looks promising. Rsyncs, directories with lots of
> files etc. all behave as I expect them to, fast.

Why are you using HA-LVM instead of CLVM? Is there any particular
reason?

PS.: We too dumped GFS/GFS2 everywhere where we don't need the
simultaneous access from more than one node at a time, also because of
the performance.



-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |



From corey.kovacs at gmail.com  Sun Sep 27 00:48:57 2009
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Sun, 27 Sep 2009 01:48:57 +0100
Subject: [Linux-cluster] GFS2 vs EXT3+HA-LVM
In-Reply-To: <20090927011033.1690d830@nb-jsosic>
References: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>
	<20090927011033.1690d830@nb-jsosic>
Message-ID: <7d6e8da40909261748n3b4e3ba8l494e10b739063700@mail.gmail.com>

clvmd is still used, basically it just makes sure the lvm changes are
propagated to all nodes. The change is in the /etc/lvm/lvm.conf where
locking_type=1 instead of 3 as is for GFS1/2. If I go this route, there will
be no use of GFS at all on this cluster. locking_type=1 along with the
volume_list config options are used to ensure that no two nodes have the
same VG mounted.

Of course this method is new to me so my understanding of how lvm2 works
with locking_type set to one works in conjunction with clvmd running could
be incorrect.

As always, comments are appreciated.

Corey




On Sun, Sep 27, 2009 at 12:10 AM, Jakov Sosic <jakov.sosic at srce.hr> wrote:

> On Sat, 26 Sep 2009 14:54:18 +0100
> Corey Kovacs <corey.kovacs at gmail.com> wrote:
>
> > However, I've come to the realization that at this time it's simply
> > not reasonable to expect the same performance from GFS2 as my current
> > setup. I have even been told from RedHat support that our use case
> > (general purpose file server with applications and home dirs along
> > with some data thrown in isn't really a good fit for using GFS2. Not
> > sure I agree with that or even understand the motivation behind the
> > comment but I am passed that now. The simple truth is for our
> > purposes, I think he may be right and I am now exploring the idea of
> > using ha-lvm with ext3. I'd not set ip up before this past week and I
> > have to say it looks promising. Rsyncs, directories with lots of
> > files etc. all behave as I expect them to, fast.
>
> Why are you using HA-LVM instead of CLVM? Is there any particular
> reason?
>
> PS.: We too dumped GFS/GFS2 everywhere where we don't need the
> simultaneous access from more than one node at a time, also because of
> the performance.
>
>
>
> --
> |    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
> =================================================================
> | start fighting cancer -> http://www.worldcommunitygrid.org/   |
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090927/2434a5b1/attachment.htm>

From rmicmirregs at gmail.com  Sun Sep 27 09:18:56 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Sun, 27 Sep 2009 11:18:56 +0200
Subject: [Linux-cluster] GFS Volume Over NFS
In-Reply-To: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>
References: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>
Message-ID: <1254043136.6517.29.camel@mecatol>

Hi Gigi,

El vie, 25-09-2009 a las 19:09 -0500, Mathew, Gigi (JSC-EG)[Jacobs
Technology] escribi?:
> Hi:
> 
>  
> 
> I am configuring a Dell EqualLogic iSCSI array in a RHEL 5.3 platform
> with two GFS2 volumes with RedHat Cluster Suite using DRAC (primary)
> and APC (secondary) fencing devices.  In my cluster configuration I
> have DELL Servers (PE R900, PE6800, PE2800, PE1950) Blades (PE M1000,
> PE1955, PE1855).  Now I have few workstations (Precision 7500,
> Precision 5500,  Precision 690) which I would like to share the above
> GFS2 volume over NFS (why?  They are not in the same location and
> currently I use NIS/YPBIND/AUTOFS).  Why should I share it? (Because I
> have user home directories on this GFS2 volume, which has to seen from
> these workstations as they login).  What are the options I have? How
> do I achieve it?  I had seen how to share an NFS over GFS, but NOT the
> other way. Any suggestion will be greatly appreciated.
> 
>  
> 
> Thanks
> 
>  
> 
> Gigi Mathew
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Maybe i'm understanding you wrong, let me explain what i have understood
from your explanation:

1.- You have several servers sharing iSCSI volumes from an iSCSI array.

2.- This servers use GFS2 as filesystem on this volumes to share,
between themselves, the information in them

3.- You want to make other computers, workstations, get access to this
info. They are not in the same location/network so they can not access
the iSCSI array by themselves.

4.- You plan to share the iSCSI volumes from the array through a NFS
export via the servers which already have access to them, so
workstations will get access to the information via NFS.

Is this right?

If so, you should have a look at this document: 

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html

NFS over GFS2 is, in my opinion, exactly what you need.

The other option i think you are thinking about, GFS2 over NFS, cant be
done. 
GFS is a filesystem you use over a device or partition. NFS is a file
sharing protocol you use to export an already existing filesystem (EXT2,
EXT3, XFS or GFS itself). When you mount a NFS volume you don't have
access to a device, so you cant use GFS on it when already mounted. 

You maybe are thinking in GFS2 over GNDB (global network block device)
or GFS2 over DRBD (distributed redundant block device), but i think
previous suggestion is simpler than this one. You can also have a look
to this approach here:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Global_Network_Block_Device/index.html

I hope this helps.

Cheers,

Rafael


-- 
Rafael Mic? Miranda



From brem.belguebli at gmail.com  Sun Sep 27 10:43:44 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Sun, 27 Sep 2009 12:43:44 +0200
Subject: [Linux-cluster] GFS Volume Over NFS
In-Reply-To: <1254043136.6517.29.camel@mecatol>
References: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>
	<1254043136.6517.29.camel@mecatol>
Message-ID: <29ae894c0909270343v50fbca12j12aa0847b5e8748a@mail.gmail.com>

Hi,

The last sentence seems to mean that Gigi wants to create a GFS on top
of a NFS FS.

This GFS will be then exported thru NFS .

Gigi, is that wjat you want to do ?

PS: how is it going Rafael ?

2009/9/27 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> Hi Gigi,
>
> El vie, 25-09-2009 a las 19:09 -0500, Mathew, Gigi (JSC-EG)[Jacobs
> Technology] escribi?:
>> Hi:
>>
>>
>>
>> I am configuring a Dell EqualLogic iSCSI array in a RHEL 5.3 platform
>> with two GFS2 volumes with RedHat Cluster Suite using DRAC (primary)
>> and APC (secondary) fencing devices. ?In my cluster configuration I
>> have DELL Servers (PE R900, PE6800, PE2800, PE1950) Blades (PE M1000,
>> PE1955, PE1855). ?Now I have few workstations (Precision 7500,
>> Precision 5500, ?Precision 690) which I would like to share the above
>> GFS2 volume over NFS (why? ?They are not in the same location and
>> currently I use NIS/YPBIND/AUTOFS). ?Why should I share it? (Because I
>> have user home directories on this GFS2 volume, which has to seen from
>> these workstations as they login). ?What are the options I have? How
>> do I achieve it? ?I had seen how to share an NFS over GFS, but NOT the
>> other way. Any suggestion will be greatly appreciated.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Gigi Mathew
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> Maybe i'm understanding you wrong, let me explain what i have understood
> from your explanation:
>
> 1.- You have several servers sharing iSCSI volumes from an iSCSI array.
>
> 2.- This servers use GFS2 as filesystem on this volumes to share,
> between themselves, the information in them
>
> 3.- You want to make other computers, workstations, get access to this
> info. They are not in the same location/network so they can not access
> the iSCSI array by themselves.
>
> 4.- You plan to share the iSCSI volumes from the array through a NFS
> export via the servers which already have access to them, so
> workstations will get access to the information via NFS.
>
> Is this right?
>
> If so, you should have a look at this document:
>
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html
>
> NFS over GFS2 is, in my opinion, exactly what you need.
>
> The other option i think you are thinking about, GFS2 over NFS, cant be
> done.
> GFS is a filesystem you use over a device or partition. NFS is a file
> sharing protocol you use to export an already existing filesystem (EXT2,
> EXT3, XFS or GFS itself). When you mount a NFS volume you don't have
> access to a device, so you cant use GFS on it when already mounted.
>
> You maybe are thinking in GFS2 over GNDB (global network block device)
> or GFS2 over DRBD (distributed redundant block device), but i think
> previous suggestion is simpler than this one. You can also have a look
> to this approach here:
>
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Global_Network_Block_Device/index.html
>
> I hope this helps.
>
> Cheers,
>
> Rafael
>
>
> --
> Rafael Mic? Miranda
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From rmicmirregs at gmail.com  Sun Sep 27 14:11:38 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Sun, 27 Sep 2009 16:11:38 +0200
Subject: [Linux-cluster] GFS Volume Over NFS
In-Reply-To: <29ae894c0909270343v50fbca12j12aa0847b5e8748a@mail.gmail.com>
References: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>
	<1254043136.6517.29.camel@mecatol>
	<29ae894c0909270343v50fbca12j12aa0847b5e8748a@mail.gmail.com>
Message-ID: <1254060698.6534.3.camel@mecatol>

Hi Brem,

If Gigi wants to build and storage architecture of "GFS over NFS over
GFS" i would say "this can't be done".

As i said in my previous mail:

"GFS is a filesystem you use over a device or partition. NFS is a file
sharing protocol you use to export an already existing filesystem (EXT2,
EXT3, XFS or GFS itself). When you mount a NFS volume you don't have
access to a device, so you cant use GFS on it when already mounted."

Which is your opinion in this, Brem?

PS: It's not going bad, thanks for asking

Cheers,

Rafael


El dom, 27-09-2009 a las 12:43 +0200, brem belguebli escribi?:
> Hi,
> 
> The last sentence seems to mean that Gigi wants to create a GFS on top
> of a NFS FS.
> 
> This GFS will be then exported thru NFS .
> 
> Gigi, is that wjat you want to do ?
> 
> PS: how is it going Rafael ?
> 
> 2009/9/27 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> > Hi Gigi,
> >
> > El vie, 25-09-2009 a las 19:09 -0500, Mathew, Gigi (JSC-EG)[Jacobs
> > Technology] escribi?:
> >> Hi:
> >>
> >>
> >>
> >> I am configuring a Dell EqualLogic iSCSI array in a RHEL 5.3 platform
> >> with two GFS2 volumes with RedHat Cluster Suite using DRAC (primary)
> >> and APC (secondary) fencing devices.  In my cluster configuration I
> >> have DELL Servers (PE R900, PE6800, PE2800, PE1950) Blades (PE M1000,
> >> PE1955, PE1855).  Now I have few workstations (Precision 7500,
> >> Precision 5500,  Precision 690) which I would like to share the above
> >> GFS2 volume over NFS (why?  They are not in the same location and
> >> currently I use NIS/YPBIND/AUTOFS).  Why should I share it? (Because I
> >> have user home directories on this GFS2 volume, which has to seen from
> >> these workstations as they login).  What are the options I have? How
> >> do I achieve it?  I had seen how to share an NFS over GFS, but NOT the
> >> other way. Any suggestion will be greatly appreciated.
> >>
> >>
> >>
> >> Thanks
> >>
> >>
> >>
> >> Gigi Mathew
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > Maybe i'm understanding you wrong, let me explain what i have understood
> > from your explanation:
> >
> > 1.- You have several servers sharing iSCSI volumes from an iSCSI array.
> >
> > 2.- This servers use GFS2 as filesystem on this volumes to share,
> > between themselves, the information in them
> >
> > 3.- You want to make other computers, workstations, get access to this
> > info. They are not in the same location/network so they can not access
> > the iSCSI array by themselves.
> >
> > 4.- You plan to share the iSCSI volumes from the array through a NFS
> > export via the servers which already have access to them, so
> > workstations will get access to the information via NFS.
> >
> > Is this right?
> >
> > If so, you should have a look at this document:
> >
> > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html
> >
> > NFS over GFS2 is, in my opinion, exactly what you need.
> >
> > The other option i think you are thinking about, GFS2 over NFS, cant be
> > done.
> > GFS is a filesystem you use over a device or partition. NFS is a file
> > sharing protocol you use to export an already existing filesystem (EXT2,
> > EXT3, XFS or GFS itself). When you mount a NFS volume you don't have
> > access to a device, so you cant use GFS on it when already mounted.
> >
> > You maybe are thinking in GFS2 over GNDB (global network block device)
> > or GFS2 over DRBD (distributed redundant block device), but i think
> > previous suggestion is simpler than this one. You can also have a look
> > to this approach here:
> >
> > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Global_Network_Block_Device/index.html
> >
> > I hope this helps.
> >
> > Cheers,
> >
> > Rafael
> >
> >
> > --
> > Rafael Mic? Miranda
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-- 
Rafael Mic? Miranda



From brem.belguebli at gmail.com  Sun Sep 27 16:37:39 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Sun, 27 Sep 2009 18:37:39 +0200
Subject: [Linux-cluster] GFS Volume Over NFS
In-Reply-To: <1254060698.6534.3.camel@mecatol>
References: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>
	<1254043136.6517.29.camel@mecatol>
	<29ae894c0909270343v50fbca12j12aa0847b5e8748a@mail.gmail.com>
	<1254060698.6534.3.camel@mecatol>
Message-ID: <29ae894c0909270937ja584adbx96e2f88a42f1260c@mail.gmail.com>

Hi,

NFS is file-oriented, not block device oriented.

On both client and server side, NFS handles filesystems not block devices.

GFS relies on block devices to run, not regular files (like all other
filesystems, except NFS that relies on a lower level filesystem: ext3,
GFS, etc.....)

Indeed, building a GFS on NFS can't easily be done, unless using
emulated block devices thru the loop driver based on regular files
exported via NFS .... but that is more of a trick than a real setup,
without talking of the performance impact.

Brem


2009/9/27 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> Hi Brem,
>
> If Gigi wants to build and storage architecture of "GFS over NFS over
> GFS" i would say "this can't be done".
>
> As i said in my previous mail:
>
> "GFS is a filesystem you use over a device or partition. NFS is a file
> sharing protocol you use to export an already existing filesystem (EXT2,
> EXT3, XFS or GFS itself). When you mount a NFS volume you don't have
> access to a device, so you cant use GFS on it when already mounted."
>
> Which is your opinion in this, Brem?
>
> PS: It's not going bad, thanks for asking
>
> Cheers,
>
> Rafael
>
>
> El dom, 27-09-2009 a las 12:43 +0200, brem belguebli escribi?:
>> Hi,
>>
>> The last sentence seems to mean that Gigi wants to create a GFS on top
>> of a NFS FS.
>>
>> This GFS will be then exported thru NFS .
>>
>> Gigi, is that wjat you want to do ?
>>
>> PS: how is it going Rafael ?
>>
>> 2009/9/27 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
>> > Hi Gigi,
>> >
>> > El vie, 25-09-2009 a las 19:09 -0500, Mathew, Gigi (JSC-EG)[Jacobs
>> > Technology] escribi?:
>> >> Hi:
>> >>
>> >>
>> >>
>> >> I am configuring a Dell EqualLogic iSCSI array in a RHEL 5.3 platform
>> >> with two GFS2 volumes with RedHat Cluster Suite using DRAC (primary)
>> >> and APC (secondary) fencing devices. ?In my cluster configuration I
>> >> have DELL Servers (PE R900, PE6800, PE2800, PE1950) Blades (PE M1000,
>> >> PE1955, PE1855). ?Now I have few workstations (Precision 7500,
>> >> Precision 5500, ?Precision 690) which I would like to share the above
>> >> GFS2 volume over NFS (why? ?They are not in the same location and
>> >> currently I use NIS/YPBIND/AUTOFS). ?Why should I share it? (Because I
>> >> have user home directories on this GFS2 volume, which has to seen from
>> >> these workstations as they login). ?What are the options I have? How
>> >> do I achieve it? ?I had seen how to share an NFS over GFS, but NOT the
>> >> other way. Any suggestion will be greatly appreciated.
>> >>
>> >>
>> >>
>> >> Thanks
>> >>
>> >>
>> >>
>> >> Gigi Mathew
>> >>
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>> > Maybe i'm understanding you wrong, let me explain what i have understood
>> > from your explanation:
>> >
>> > 1.- You have several servers sharing iSCSI volumes from an iSCSI array.
>> >
>> > 2.- This servers use GFS2 as filesystem on this volumes to share,
>> > between themselves, the information in them
>> >
>> > 3.- You want to make other computers, workstations, get access to this
>> > info. They are not in the same location/network so they can not access
>> > the iSCSI array by themselves.
>> >
>> > 4.- You plan to share the iSCSI volumes from the array through a NFS
>> > export via the servers which already have access to them, so
>> > workstations will get access to the information via NFS.
>> >
>> > Is this right?
>> >
>> > If so, you should have a look at this document:
>> >
>> > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html
>> >
>> > NFS over GFS2 is, in my opinion, exactly what you need.
>> >
>> > The other option i think you are thinking about, GFS2 over NFS, cant be
>> > done.
>> > GFS is a filesystem you use over a device or partition. NFS is a file
>> > sharing protocol you use to export an already existing filesystem (EXT2,
>> > EXT3, XFS or GFS itself). When you mount a NFS volume you don't have
>> > access to a device, so you cant use GFS on it when already mounted.
>> >
>> > You maybe are thinking in GFS2 over GNDB (global network block device)
>> > or GFS2 over DRBD (distributed redundant block device), but i think
>> > previous suggestion is simpler than this one. You can also have a look
>> > to this approach here:
>> >
>> > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Global_Network_Block_Device/index.html
>> >
>> > I hope this helps.
>> >
>> > Cheers,
>> >
>> > Rafael
>> >
>> >
>> > --
>> > Rafael Mic? Miranda
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> --
> Rafael Mic? Miranda
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From raju.rajsand at gmail.com  Sun Sep 27 16:40:08 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Sun, 27 Sep 2009 22:10:08 +0530
Subject: [Linux-cluster] GFS Volume Over NFS
In-Reply-To: <29ae894c0909270343v50fbca12j12aa0847b5e8748a@mail.gmail.com>
References: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>
	<1254043136.6517.29.camel@mecatol>
	<29ae894c0909270343v50fbca12j12aa0847b5e8748a@mail.gmail.com>
Message-ID: <8786b91c0909270940i3ce67394tb1297d6b5cde204f@mail.gmail.com>

Greetings,


On Sun, Sep 27, 2009 at 4:13 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> Hi,
>
> The last sentence seems to mean that Gigi wants to create a GFS on top
> of a NFS FS.
>
> This GFS will be then exported thru NFS .

whaa??

If I have understood correctly,
GFS is a "cooked" filesystem (like ext3/jfs/xfs etc.) which expects a
raw device underneath such as HDD partition / LUN either from
SAN/iSCSI device.
NFS is fileshareing protocol which requires a "cooked" file system
underneath such as ext3/gfs/jfs/xfs etc..

So the question of "GFS over NFS" is ab initio absurdum (pardon my french here)

Regards

Rajagopal



From gigi.mathew-1 at nasa.gov  Mon Sep 28 03:14:04 2009
From: gigi.mathew-1 at nasa.gov (Mathew, Gigi (JSC-EG)[Jacobs Technology])
Date: Sun, 27 Sep 2009 22:14:04 -0500
Subject: [Linux-cluster] GFS Volume Over NFS
In-Reply-To: <1254043136.6517.29.camel@mecatol>
References: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>
	<1254043136.6517.29.camel@mecatol>
Message-ID: <F2F32F62B337534BB11C66F226B8FABA34DE2203B6@NDJSSCC02.ndc.nasa.gov>

Rafael:

Thanks for your reply.  I believe NFS over GFS2 should be the solution.

Now I have a question about GFS performance fine tuning.  Is there any documents do you recommend?

Thanks

-- Gigi

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Rafael Mic? Miranda
Sent: Sunday, September 27, 2009 4:19 AM
To: linux clustering
Subject: Re: [Linux-cluster] GFS Volume Over NFS

Hi Gigi,

El vie, 25-09-2009 a las 19:09 -0500, Mathew, Gigi (JSC-EG)[Jacobs
Technology] escribi?:
> Hi:
> 
>  
> 
> I am configuring a Dell EqualLogic iSCSI array in a RHEL 5.3 platform
> with two GFS2 volumes with RedHat Cluster Suite using DRAC (primary)
> and APC (secondary) fencing devices.  In my cluster configuration I
> have DELL Servers (PE R900, PE6800, PE2800, PE1950) Blades (PE M1000,
> PE1955, PE1855).  Now I have few workstations (Precision 7500,
> Precision 5500,  Precision 690) which I would like to share the above
> GFS2 volume over NFS (why?  They are not in the same location and
> currently I use NIS/YPBIND/AUTOFS).  Why should I share it? (Because I
> have user home directories on this GFS2 volume, which has to seen from
> these workstations as they login).  What are the options I have? How
> do I achieve it?  I had seen how to share an NFS over GFS, but NOT the
> other way. Any suggestion will be greatly appreciated.
> 
>  
> 
> Thanks
> 
>  
> 
> Gigi Mathew
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Maybe i'm understanding you wrong, let me explain what i have understood
from your explanation:

1.- You have several servers sharing iSCSI volumes from an iSCSI array.

2.- This servers use GFS2 as filesystem on this volumes to share,
between themselves, the information in them

3.- You want to make other computers, workstations, get access to this
info. They are not in the same location/network so they can not access
the iSCSI array by themselves.

4.- You plan to share the iSCSI volumes from the array through a NFS
export via the servers which already have access to them, so
workstations will get access to the information via NFS.

Is this right?

If so, you should have a look at this document: 

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html

NFS over GFS2 is, in my opinion, exactly what you need.

The other option i think you are thinking about, GFS2 over NFS, cant be
done. 
GFS is a filesystem you use over a device or partition. NFS is a file
sharing protocol you use to export an already existing filesystem (EXT2,
EXT3, XFS or GFS itself). When you mount a NFS volume you don't have
access to a device, so you cant use GFS on it when already mounted. 

You maybe are thinking in GFS2 over GNDB (global network block device)
or GFS2 over DRBD (distributed redundant block device), but i think
previous suggestion is simpler than this one. You can also have a look
to this approach here:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Global_Network_Block_Device/index.html

I hope this helps.

Cheers,

Rafael


-- 
Rafael Mic? Miranda

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From fajar at fajar.net  Mon Sep 28 03:25:32 2009
From: fajar at fajar.net (Fajar A. Nugraha)
Date: Mon, 28 Sep 2009 10:25:32 +0700
Subject: [Linux-cluster] GFS Volume Over NFS
In-Reply-To: <F2F32F62B337534BB11C66F226B8FABA34DE2203B6@NDJSSCC02.ndc.nasa.gov>
References: <F2F32F62B337534BB11C66F226B8FABA34DCBE2DAD@NDJSSCC02.ndc.nasa.gov>
	<1254043136.6517.29.camel@mecatol>
	<F2F32F62B337534BB11C66F226B8FABA34DE2203B6@NDJSSCC02.ndc.nasa.gov>
Message-ID: <7207d96f0909272025l6d404665iacccf387ab1bd017@mail.gmail.com>

On Mon, Sep 28, 2009 at 10:14 AM, Mathew, Gigi (JSC-EG)[Jacobs
Technology] <gigi.mathew-1 at nasa.gov> wrote:
> Rafael:
>
> Thanks for your reply. ?I believe NFS over GFS2 should be the solution.
>
> Now I have a question about GFS performance fine tuning. ?Is there any documents do you recommend?

Since you specifically mention you'd be using it for home directory, I
think you should read this thread first (to be specific, read the
response he got from RH support)

https://www.redhat.com/archives/linux-cluster/2009-September/msg00197.html

-- 
Fajar



From swhiteho at redhat.com  Mon Sep 28 08:47:09 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 28 Sep 2009 09:47:09 +0100
Subject: [Linux-cluster] GFS2 directory hangs on one node CentOS 5.3
In-Reply-To: <a0da40e00909260929i50032d3xa99010e99f292a88@mail.gmail.com>
References: <a0da40e00909260929i50032d3xa99010e99f292a88@mail.gmail.com>
Message-ID: <1254127629.6052.372.camel@localhost.localdomain>

Hi,

On Sat, 2009-09-26 at 18:29 +0200, Libor Tomsik wrote:
> Hi all,
> 
> I'm having a strange issue with a two nodes cluster based on xen
> virtual hosts with shared disk on clvm. The servers are running apache
> and one is considered as hot backup. On that node awstats are counted
> from the apache custom logs stored on the shared device. Web data,
> logs, configs and awstats results are in different directories withing
> the same GFS2 volume.
> 
> Everything works fine, but sometimes (at production environment, damn)
> the directory with logs get frozen for the spare node with awstats.
> All commands like ls, cd, mc on that directory get status D. On the
> second node all works fine. Other directories seems unaffected too.
> 
> I can not umount fs neither remout it ro and back rw since there are
> "running" processes at D state.
> 
> Can someone give me some advice, how-to prevent this problem? And
> how-to recovery from it? It is a production with SLA on :(  In next
> time, I'll try to make lockdump on both nodes.
> 
> Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2,
> kmod-gfs2-xen-1.92-1.1.el5_2.2
> 
> Regards
> 
> Libor
> 
That sounds to me like there is a lot of activity from both nodes
relating to the same directory. Can you split the logs of the two nodes
into two different directories? That will probably solve the problem.

This kind of problem is tricky to debug since the glock dumps will tell
you what state the glocks are currently in, and not what has been
happening the in past.

In the upstream code we've now got GFS2 tracepoints which will help in
tracking down issues like this, but those are not in RHEL yet,

Steve.

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From swhiteho at redhat.com  Mon Sep 28 09:02:05 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 28 Sep 2009 10:02:05 +0100
Subject: [Linux-cluster] GFS2: Huge number of glocks
In-Reply-To: <4ABD177C.6030705@isye.gatech.edu>
References: <c8c8ded30907100742y21661bd3s977e61c996fe8a11@mail.gmail.com>
	<1247239627.3384.38.camel@localhost.localdomain>
	<c8c8ded30907100849m3f541939w1a2768fab95faf5a@mail.gmail.com>
	<1247241416.3384.42.camel@localhost.localdomain>
	<420241f50909240104w6d8e15a8q9053d66024d88678@mail.gmail.com>
	<1253781999.2760.8.camel@localhost.localdomain>
	<4ABD177C.6030705@isye.gatech.edu>
Message-ID: <1254128525.6052.387.camel@localhost.localdomain>

Hi,

On Fri, 2009-09-25 at 15:18 -0400, Allen Belletti wrote:
> Hi All,
> 
> I had posted about this once before and didn't get a response.  Was
> really hoping that Steven or another person who's involved might be able
> to comment at least briefly.  It would really help to know if this is
> "normal" or not.
> 
> Thanks,
> Allen
> 
> -------------
> 
> I've been running GFS and now GFS2 for several years on a two-node mail
> cluster, generally with good results, especially once GFS2 became
> production ready and we upgraded.  However from time to time (ranging
> from a few days to a month), we'll get a "stuck" lock on one particular
> file or another which them blocks a user from their mail.  While looking
> into this, I've recently become aware of a VERY large number of glocks
> being left behind after our nightly rsync backups.  I'm checking on the
> lock situation with "gfs2_tool lockdump /home" and counting locks by
> piping through "grep ^G | wc -l".  We have two GFS2 filesystems
> mounted.  On one of them, the number of glocks returns to "normal" after
> the backup (currently showing about 5400.)  On the other, it stays very
> high although it will drop somewhat throughout the day.  Currently I am
> seeing over 500,000.  Given the ten minutes or so that it takes to list
> them, this seems like it can't be great for performance.
> 
Listing glocks via the sysfs interface is not very efficient. It is
however rather better than it used to be, due to the reduced amount of
text per glock which is generated (compared with gfs1 for example).

> Most of the locks look like this:
> 
> G:  s:SH n:5/b25806 f: t:SH d:EX/0 l:0 a:0 r:3
> H: s:SH f:EH e:0 p:31042 [(ended)] gfs2_inode_lookup+0x114/0x1f0 [gfs2]
> 
Its a shared lock that is being cached in case of future use. It is
harmless, tbh and looks normal to me.

> Note that the pid (31042 in this case) corresponds to one of the
> completed rsync processes which generated the locks in the first place.
> 
The holder relates to an inode which was looked up by the pid in
question. It will continue to exist until the inode is pushed out of
cache.

> My questions are 1) Is this a bad thing?  My gut feeling is "yes" but
> perhaps the system is highly efficient in dealing with these locks, and
Generally it's a good thing. Each of those cached locks relates to disk
I/O which does not need to be done if the same inode is accessed in
future.

> 2) Can anything be done about it?  The tuning opportunities in GFS2 are
> very limited compared to GFS, and the few things I've tried seem to have
> no effect.
> 
That is deliberate policy - the idea is to be self tuning. If you read
in every inode in the filesystem (which rsync tends to do) then you are
going to fill the cache on the node, just the same as if you did the
same thing to a single node fs. The difference is that in the cluster
case that makes subsequent accesses to the same inode on the node that
did the rsync much faster, and subsequent accesses from a different node
much slower (if they are write accesses and thus require exclusive
locks).

If that is a problem then making a VFS drop caches request after the
rsync might well help prevent/reduce some of the symptoms.

There are further improvements that we can make. One of the big issues
is the amount of writeback that requests to drop locks can cause
(assuming cached dirty data). Also multiple requests to drop locks from
many nodes at once tend not to produce an efficient pattern of I/O.
Solving that problem though is hard, and something that we would like to
do, but may take some time.

> By the way, I am running with plock_ownership="1" and
> plock_rate_limit="0" in cluster.conf.
> 
> Thanks in advance,
> Allen
> 
I'd suggest turning off plock_ownership unless you are on a very
uptodate kernel as it is broken on some early kernels. Having
plock_rate_limit=0 is a good plan though. Unless you are using an
application which uses plocks (I don't know if rsync does or not) then
these will not make any difference, anyway,

Steve.




From swhiteho at redhat.com  Mon Sep 28 09:15:44 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 28 Sep 2009 10:15:44 +0100
Subject: [Linux-cluster] GFS2 fatal: invalid metadata block
In-Reply-To: <4ABD0C74.7030705@fiber.net>
References: <4AB4BD8E.9060905@fiber.net>
	<1253537628.6052.274.camel@localhost.localdomain>
	<4ABD0C74.7030705@fiber.net>
Message-ID: <1254129344.6052.399.camel@localhost.localdomain>

Hi,

You seem to have a number of issues here....

On Fri, 2009-09-25 at 12:31 -0600, Kai Meyer wrote:
> Sorry for the slow response. We ended up (finally) purchasing some RHEL 
> licenses to try and get some phone support for this problem, and came up 
> with a plan to salvage what we could. I'll try to offer a brief history 
> of the problem in hope you can help me understand this issue a little 
> better.
> I've posted the relevant logfile entries to the events described here : 
> http://kai.gnukai.com/gfs2_meltdown.txt
> All the nodes send syslog to a remote server named pxe, so the combined 
> syslog for all the nodes plus the syslog server is here: 
> http://kai.gnukai.com/messages.txt
> We started with a 4 node cluster (nodes 1, 2, 4, 5). The GFS2 filesystem 
> was created with the latest CentOS 5.3 had to offer when it was 
> released. Node 3 was off at the time the errors occurred, and not part 
> of the cluster.
> First issue I can recover from syslog is from node 5 (192.168.100.105) 
> on Sep 8 14:11:27 was a 'fatal: invalid metadata block' error that 
> resulted in the file system being withdrawn.

Ok. So lets start with that message. Once that message has appeared, it
means that something on disk has been corrupted. The only way in which
that can be fixed is to unmount on all nodes and run fsck.gfs2 on the
filesystem. The other nodes will only carry on working until they too
read the same erroneous block.

These issues are usually very tricky to track down. The main reason for
that is that the event which caused the corruption is usually long in
the past before the issue is discovered. Often there has been so much
activity that its impossible to attribute it to any particular event.

That said, we are very interested to receive reports of such corruption
in case we can figure out the common factors between such reports.

The current behaviour of withdrawing a node in the event of a disk error
is not ideal. In reality there is often little other choice though, as
letting the node continue to operate risks possible greater corruption
of data due to the potential for it to be working on incorrect data from
the original problem.

On recent upstream kernels we've tried to be a bit better about handling
such errors by turning off use of individual resource groups in some
cases, so that at least some filesystem activity can carry on.

> Next was node 4 (192.168.100.104) to hit a 'fatal: filesystem 
> consistency error' that also resulted in the file system being 
> withdrawn. On the systems themselves, any attempt to access the 
> filesystem would result in a I/O error response. At the prospect of 
> rebooting 2 of the 4 nodes in my cluster, I brought node 3 
> (192.168.100.103) online first. Then I power cycled nodes 4 and 5 one at 
> a time and let them come back online. These nodes are running Xen, so I 
> start to bring the VMs that were on nodes 4 and 5 online on nodes 3-5 
> after all 3 had joined the cluster.
> Shortly thereafter, node 3 encounters the 'fatal: invalid metadata 
> block', and withdraws the file system. Then node 2 (.102) encounters 
> 'fatal: invalid metadata block' also, and withdraws the filesystem. So I 
> reboot them.
> During their reboot, nodes 1 (.101) and 5 hits the same 'fatal: invalid 
> metadata block' error. I waited for nodes 2 and 3 to come back online to 
> preserve the cluster. At this point, node 4 was the only node that still 
> had the filesystem mounted. After I had rebooted the other 4 nodes, none 
> of them could mount the files system after joining the cluster, and node 
> 4 was spinning on the error:
> Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.0: jid=4: Trying to acquire journal lock...
> Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
> fsid=xencluster1:xenclusterfs1.0: jid=4: Busy
> It wasn't until this point that we suspected the SAN. We discovered that 
> the SAN had marked a drive as "failed" but did not remove it from the 
> array and begin to rebuild on the hot spare. When we physically removed 
> the failed drive, the hot spare was picked up and put into the array.
> The VMs on node 4 were the only ones "running" but they had all crashed 
> because their disk was unavailable. I decided to reboot all the nodes to 
> try and re-establish the cluster. We were able to get all the VMs turned 
> back on, and we thought we were out of the dark, with the exception of 
> the high level of filesystem corruption we caused inside 30% of the VM's 
> filesystems. We ran them through their ext3 filesystem checks, and got 
> them all running again.
> 
ext3 or gfs2? I assume you mean the latter

> Then at the time I send the original email, we were encountering the 
> same invalid metadata block errors on the VMs at different points.
> 
> With Redhat on the phone, we decided to migrate as much data as we could 
> from the original production SAN to a new SAN, and bring the VMs online 
> on the new SAN. There were a total of 3 VM disk images that would not 
> copy because they would trigger the invalid metadata block error every 
> time. After the migration, we tried 3 filesystem checks, all of which 
> failed, leaving the fsck_dlm mechanism configured on the filesystem. We 
> were able to override the lock with the instructions here:
> http://kbase.redhat.com/faq/docs/DOC-17402
> 
Was that reported as a bugzilla? fsck.gfs2 should certainly not fail int
that way. Although, bearing in mind what you've said about bad hardware,
that might be the reason. 

> We were able to remount the gfs2 filesystem again, but with out any 
> improvement. If we tried to copy those same files, it would withdraw the 
> filesystem. We were able to mount the disk images and recover some 
> files, and we feel like we got lucky that none of those files also 
> triggered the filesystem withdraw.
> 
> Today, we feel fairly confident that the underlying issue began with the 
> disk being marked as failed, but not being removed from the array. We've 
> contacted the hardware vendor of the SAN, but the only response they 
> offered was, "That shouldn't have happened, and it shouldn't happen again."
> 
> 
> I am very interested in your response, but at this point there isn't any 
> urgency. The old production SAN is in the lab, and we are running smart 
> tests on each of the disks to see if any of the disks are salvageable. 
> The new SAN we put into production has all new drives (different make an 
> model), and we hope we don't encounter any further issues.
> 
> The last issue I want to investigate is phasing out all my CentOS 5.3 
> servers, and installing RHEL 5.4 servers one at a time, so I don't have 
> to take down the cluster. The intent is to live migrate all the VMs to 
> the RHEL 5.4 servers to keep the availability on the VMs as high as 
> possible, but that's another topic I'll probably post about later after 
> some testing in the lab.
> 
> -Kai Meyer
> 
In theory that should work. In reality I'd certainly suggest testing
carefully before attempting that. Using RHEL 5.4 is certainly a better
option. I've no idea how many of our bug fixes CentOS has picked up,

Steve.





> Steven Whitehouse wrote:
> > Hi,
> >
> > On Sat, 2009-09-19 at 05:16 -0600, Kai Meyer wrote:
> >   
> >> I have a 5 node cluster running kernel 2.6.18-128.1.6.el5xen and 
> >> gfs2-utils-0.1.53-1.el5_3.3 . Twice in 10 days, each node in my cluster 
> >> has failed with the same message in /var/log/messages. dmesg reports the 
> >> same errors, and on some nodes there are no other entries previous to 
> >> the invalid metadata block error.
> >>
> >> I would like to know what issues can trigger such an event. If it is 
> >> more helpful for me to provide more information, I will be happy to, I'm 
> >> just not sure what other information you would consider relevant.
> >>
> >> Thank you for your time,
> >> -Kai Meyer
> >>
> >>     
> > It means that the kernel was looking for an indirect block, but instead
> > found something that was not an indirect block. The only way to fix this
> > is with fsck (after unmounting on all nodes) otherwise the issue is
> > likely to continue to occur each time you access the particular inode
> > with the problem.
> >
> > There have been a couple of reports of this (or very similar) issues
> > recently. The problem in each case is that the original issue probably
> > happened some time before it triggered the message which you've seen.
> > That means that it is very tricky to figure out exactly what the cause
> > is.
> >
> > I'd be very interested to know whether this filesystem was a newly
> > created gfs2 filesystem or an upgraded gfs1 filesystem. Also, whether
> > there have been any other issues, however minor, which might have caused
> > a node to be rebooted or fenced since the filesystem was created? Also,
> > any other background information about the type of workload that was
> > being run on the filesystem would be helpful too.
> >
> > Steve.
> >
> >
> >   
> >> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> >> fsid=xencluster1:xenclusterfs1.1: fatal: invalid metadata block
> >> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> >> fsid=xencluster1:xenclusterfs1.1:   bh = 567447963 (magic number)
> >> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> >> fsid=xencluster1:xenclusterfs1.1:   function = 
> >> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line
> >> = 334
> >> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> >> fsid=xencluster1:xenclusterfs1.1: about to withdraw this file system
> >> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
> >> fsid=xencluster1:xenclusterfs1.1: telling LM to withdraw
> >> Sep 19 02:02:07 192.168.100.104 kernel: GFS2: 
> >> fsid=xencluster1:xenclusterfs1.1: withdrawn
> >> Sep 19 02:02:07 192.168.100.104 kernel: 
> >> Sep 19 02:02:07 192.168.100.104 kernel: Call Trace:
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885154ce>] 
> >> :gfs2:gfs2_lm_withdraw+0xc1/0xd0
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262907>] 
> >> __wait_on_bit+0x60/0x6e
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80215788>] 
> >> sync_buffer+0x0/0x3f
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262981>] 
> >> out_of_line_wait_on_bit+0x6c/0x78
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8029a01a>] 
> >> wake_bit_function+0x0/0x23
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a7f1>] 
> >> submit_bh+0x10a/0x111
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885284a7>] 
> >> :gfs2:gfs2_meta_check_ii+0x2c/0x38
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88518d30>] 
> >> :gfs2:gfs2_meta_indirect_buffer+0x104/0x160
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88509fc3>] 
> >> :gfs2:gfs2_block_map+0x1dc/0x33e
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a821>] 
> >> poll_freewait+0x29/0x6a
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a199>] 
> >> :gfs2:gfs2_extent_map+0x74/0xac
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a2ce>] 
> >> :gfs2:gfs2_write_alloc_required+0xfd/0x122
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885128d5>] 
> >> :gfs2:gfs2_glock_nq+0x248/0x273
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851a27c>] 
> >> :gfs2:gfs2_write_begin+0x99/0x36a
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851bd1b>] 
> >> :gfs2:gfs2_file_buffered_write+0x14b/0x2e5
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8020d3a5>] 
> >> file_read_actor+0x0/0xfc
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c151>] 
> >> :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c2f4>] 
> >> :gfs2:gfs2_file_write_nolock+0xaa/0x10f
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022eca0>] 
> >> __wake_up+0x38/0x4f
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80299fec>] 
> >> autoremove_wake_function+0x0/0x2e
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022fbe4>] 
> >> pipe_readv+0x38e/0x3a2
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80263bce>] 
> >> lock_kernel+0x1b/0x32
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c444>] 
> >> :gfs2:gfs2_file_write+0x49/0xa7
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80216da9>] 
> >> vfs_write+0xce/0x174
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff802175e1>] 
> >> sys_write+0x45/0x6e
> >> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8025f2f9>] 
> >> tracesys+0xab/0xb6
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>     
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >   
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ltomsik at gmail.com  Mon Sep 28 10:13:09 2009
From: ltomsik at gmail.com (Libor Tomsik)
Date: Mon, 28 Sep 2009 12:13:09 +0200
Subject: [Linux-cluster] GFS2 directory hangs on one node CentOS 5.3
Message-ID: <a0da40e00909280313n4d7afc33rce0cc93971198745@mail.gmail.com>

Hi,
>Hi,
>
>On Sat, 2009-09-26 at 18:29 +0200, Libor Tomsik wrote:
>> Hi all,
>>
>> I'm having a strange issue with a two nodes cluster based on xen
>> virtual hosts with shared disk on clvm. The servers are running apache
>> and one is considered as hot backup. On that node awstats are counted
>> from the apache custom logs stored on the shared device. Web data,
>> logs, configs and awstats results are in different directories withing
>> the same GFS2 volume.
>>
>> Everything works fine, but sometimes (at production environment, damn)
>> the directory with logs get frozen for the spare node with awstats.
>> All commands like ls, cd, mc on that directory get status D. On the
>> second node all works fine. Other directories seems unaffected too.
>>
>> I can not umount fs neither remout it ro and back rw since there are
>> "running" processes at D state.
>>
>> Can someone give me some advice, how-to prevent this problem? And
>> how-to recovery from it? It is a production with SLA on :(  In next
>> time, I'll try to make lockdump on both nodes.
>>
>> Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2,
>> kmod-gfs2-xen-1.92-1.1.el5_2.2
>>
>> Regards
>>
>> Libor
>>
>That sounds to me like there is a lot of activity from both nodes
>relating to the same directory. Can you split the logs of the two nodes
>into two different directories? That will probably solve the problem.
>
Actually there is just one apache writing on one server. Well in many
threads. Maybe this is the problem? I have about 40 sites hosted
there. So 2x40 separate log files.
The second node is just periodically reading this directory.

>This kind of problem is tricky to debug since the glock dumps will tell
>you what state the glocks are currently in, and not what has been
>happening the in past.
>
>In the upstream code we've now got GFS2 tracepoints which will help in
>tracking down issues like this, but those are not in RHEL yet,
>
>Steve.
>
>> --
>> Linux-cluster mailing list
>> Linux-cluster redhat com
>> https://www.redhat.com/mailman/listinfo/linux-cluster

Regards

Libor.



From swhiteho at redhat.com  Mon Sep 28 10:21:27 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 28 Sep 2009 11:21:27 +0100
Subject: [Linux-cluster] GFS2 directory hangs on one node CentOS 5.3
In-Reply-To: <a0da40e00909280313n4d7afc33rce0cc93971198745@mail.gmail.com>
References: <a0da40e00909280313n4d7afc33rce0cc93971198745@mail.gmail.com>
Message-ID: <1254133287.6052.401.camel@localhost.localdomain>

Hi,

On Mon, 2009-09-28 at 12:13 +0200, Libor Tomsik wrote:
> Hi,
> >Hi,
> >
> >On Sat, 2009-09-26 at 18:29 +0200, Libor Tomsik wrote:
> >> Hi all,
> >>
> >> I'm having a strange issue with a two nodes cluster based on xen
> >> virtual hosts with shared disk on clvm. The servers are running apache
> >> and one is considered as hot backup. On that node awstats are counted
> >> from the apache custom logs stored on the shared device. Web data,
> >> logs, configs and awstats results are in different directories withing
> >> the same GFS2 volume.
> >>
> >> Everything works fine, but sometimes (at production environment, damn)
> >> the directory with logs get frozen for the spare node with awstats.
> >> All commands like ls, cd, mc on that directory get status D. On the
> >> second node all works fine. Other directories seems unaffected too.
> >>
> >> I can not umount fs neither remout it ro and back rw since there are
> >> "running" processes at D state.
> >>
> >> Can someone give me some advice, how-to prevent this problem? And
> >> how-to recovery from it? It is a production with SLA on :(  In next
> >> time, I'll try to make lockdump on both nodes.
> >>
> >> Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2,
> >> kmod-gfs2-xen-1.92-1.1.el5_2.2
> >>
> >> Regards
> >>
> >> Libor
> >>
> >That sounds to me like there is a lot of activity from both nodes
> >relating to the same directory. Can you split the logs of the two nodes
> >into two different directories? That will probably solve the problem.
> >
> Actually there is just one apache writing on one server. Well in many
> threads. Maybe this is the problem? I have about 40 sites hosted
> there. So 2x40 separate log files.
> The second node is just periodically reading this directory.
> 
That can still cause a problem. The second node will require a shared
lock on the directory, so if there is any file creation going on, it
will be dramatically slowed down by that. Is it possible to stop the
second node's I/O to check that?

There shouldn't really be a bit issue with lots of threads provided they
are all on the same node as is the case here,

Steve.


> >This kind of problem is tricky to debug since the glock dumps will tell
> >you what state the glocks are currently in, and not what has been
> >happening the in past.
> >
> >In the upstream code we've now got GFS2 tracepoints which will help in
> >tracking down issues like this, but those are not in RHEL yet,
> >
> >Steve.
> >
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster redhat com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> Regards
> 
> Libor.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ltomsik at gmail.com  Mon Sep 28 10:56:24 2009
From: ltomsik at gmail.com (Libor Tomsik)
Date: Mon, 28 Sep 2009 12:56:24 +0200
Subject: [Linux-cluster] GFS2 directory hangs on one node CentOS 5.3
Message-ID: <a0da40e00909280356u76f5b345le9ddf06f8e46abd3@mail.gmail.com>

>Hi,
>
>On Mon, 2009-09-28 at 12:13 +0200, Libor Tomsik wrote:
>> Hi,
>> >Hi,
>> >
>> >On Sat, 2009-09-26 at 18:29 +0200, Libor Tomsik wrote:
>> >> Hi all,
>> >>
>> >> I'm having a strange issue with a two nodes cluster based on xen
>> >> virtual hosts with shared disk on clvm. The servers are running apache
>> >> and one is considered as hot backup. On that node awstats are counted
>> >> from the apache custom logs stored on the shared device. Web data,
>> >> logs, configs and awstats results are in different directories withing
>> >> the same GFS2 volume.
>> >>
>> >> Everything works fine, but sometimes (at production environment, damn)
>> >> the directory with logs get frozen for the spare node with awstats.
>> >> All commands like ls, cd, mc on that directory get status D. On the
>> >> second node all works fine. Other directories seems unaffected too.
>> >>
>> >> I can not umount fs neither remout it ro and back rw since there are
>> >> "running" processes at D state.
>> >>
>> >> Can someone give me some advice, how-to prevent this problem? And
>> >> how-to recovery from it? It is a production with SLA on :(  In next
>> >> time, I'll try to make lockdump on both nodes.
>> >>
>> >> Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2,
>> >> kmod-gfs2-xen-1.92-1.1.el5_2.2
>> >>
>> >> Regards
>> >>
>> >> Libor
>> >>
>> >That sounds to me like there is a lot of activity from both nodes
>> >relating to the same directory. Can you split the logs of the two nodes
>> >into two different directories? That will probably solve the problem.
>> >
>> Actually there is just one apache writing on one server. Well in many
>> threads. Maybe this is the problem? I have about 40 sites hosted
>> there. So 2x40 separate log files.
>> The second node is just periodically reading this directory.
>>
>That can still cause a problem. The second node will require a shared
>lock on the directory, so if there is any file creation going on, it
>will be dramatically slowed down by that. Is it possible to stop the
>second node's I/O to check that?
>
Yes it is possible to generate statistic at the apache node, not the
second one, if it might help. But where is then the advantage of
clustered fs accessible from all nodes?
>
>There shouldn't really be a bit issue with lots of threads provided they
>are all on the same node as is the case here,
>
>Steve.
>
>
>> >This kind of problem is tricky to debug since the glock dumps will tell
>> >you what state the glocks are currently in, and not what has been
>> >happening the in past.
>> >
Is somehow possible to free this locks by force? I mean kill dead
processes and remount fs on affected node? Some recovery solution?
>> >In the upstream code we've now got GFS2 tracepoints which will help in
>> >tracking down issues like this, but those are not in RHEL yet,
>> >
>> >Steve.
>> >
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster redhat com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> Regards
>>
>> Libor.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster redhat com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
Libor



From swhiteho at redhat.com  Mon Sep 28 11:01:15 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 28 Sep 2009 12:01:15 +0100
Subject: [Linux-cluster] GFS2 directory hangs on one node CentOS 5.3
In-Reply-To: <a0da40e00909280356u76f5b345le9ddf06f8e46abd3@mail.gmail.com>
References: <a0da40e00909280356u76f5b345le9ddf06f8e46abd3@mail.gmail.com>
Message-ID: <1254135675.6052.405.camel@localhost.localdomain>

Hi,

On Mon, 2009-09-28 at 12:56 +0200, Libor Tomsik wrote:
> >Hi,
> >
> >On Mon, 2009-09-28 at 12:13 +0200, Libor Tomsik wrote:
> >> Hi,
> >> >Hi,
> >> >
> >> >On Sat, 2009-09-26 at 18:29 +0200, Libor Tomsik wrote:
> >> >> Hi all,
> >> >>
> >> >> I'm having a strange issue with a two nodes cluster based on xen
> >> >> virtual hosts with shared disk on clvm. The servers are running apache
> >> >> and one is considered as hot backup. On that node awstats are counted
> >> >> from the apache custom logs stored on the shared device. Web data,
> >> >> logs, configs and awstats results are in different directories withing
> >> >> the same GFS2 volume.
> >> >>
> >> >> Everything works fine, but sometimes (at production environment, damn)
> >> >> the directory with logs get frozen for the spare node with awstats.
> >> >> All commands like ls, cd, mc on that directory get status D. On the
> >> >> second node all works fine. Other directories seems unaffected too.
> >> >>
> >> >> I can not umount fs neither remout it ro and back rw since there are
> >> >> "running" processes at D state.
> >> >>
> >> >> Can someone give me some advice, how-to prevent this problem? And
> >> >> how-to recovery from it? It is a production with SLA on :(  In next
> >> >> time, I'll try to make lockdump on both nodes.
> >> >>
> >> >> Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2,
> >> >> kmod-gfs2-xen-1.92-1.1.el5_2.2
> >> >>
> >> >> Regards
> >> >>
> >> >> Libor
> >> >>
> >> >That sounds to me like there is a lot of activity from both nodes
> >> >relating to the same directory. Can you split the logs of the two nodes
> >> >into two different directories? That will probably solve the problem.
> >> >
> >> Actually there is just one apache writing on one server. Well in many
> >> threads. Maybe this is the problem? I have about 40 sites hosted
> >> there. So 2x40 separate log files.
> >> The second node is just periodically reading this directory.
> >>
> >That can still cause a problem. The second node will require a shared
> >lock on the directory, so if there is any file creation going on, it
> >will be dramatically slowed down by that. Is it possible to stop the
> >second node's I/O to check that?
> >
> Yes it is possible to generate statistic at the apache node, not the
> second one, if it might help. But where is then the advantage of
> clustered fs accessible from all nodes?
You are right that it should be possible to access the data from all
nodes, but for now we are trying to find out the source of the issue
that you are experiencing, and it seems a reasonable thing to check.

> >
> >There shouldn't really be a bit issue with lots of threads provided they
> >are all on the same node as is the case here,
> >
> >Steve.
> >
> >
> >> >This kind of problem is tricky to debug since the glock dumps will tell
> >> >you what state the glocks are currently in, and not what has been
> >> >happening the in past.
> >> >
> Is somehow possible to free this locks by force? I mean kill dead
> processes and remount fs on affected node? Some recovery solution?
There is a detailed automatic recovery solution which requires fencing
the nodes which have failed. This is required as it is the only way to
be sure that the filesystem isn't written to by nodes which are not as
dead as they might seem,

Steve.


> >> >In the upstream code we've now got GFS2 tracepoints which will help in
> >> >tracking down issues like this, but those are not in RHEL yet,
> >> >
> >> >Steve.
> >> >
> >> >> --
> >> >> Linux-cluster mailing list
> >> >> Linux-cluster redhat com
> >> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> Regards
> >>
> >> Libor.
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster redhat com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> Libor
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From pradhanparas at gmail.com  Mon Sep 28 15:03:48 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Mon, 28 Sep 2009 10:03:48 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909181508q1285d434of112d91cd5d74cdc@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
Message-ID: <8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>

The only thing I noticed is the message after stopping the vm using xm
in all nodes and starting using clusvcadm is

"Virtual machine guest1 is blocked"

The whole DEBUG file is attached.


Thanks
Paras.

On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> There's a problem with the script that is called by rgmanager to start
> the VM, I don't know what causes it
>
> May be you should try something like :
>
> 1) stop the VM on all nodes with xm commands
> 2) edit the /usr/share/cluster/vm.sh script and add the following
> lines (after the #!/bin/bash ):
> ? exec >/tmp/DEBUG 2>&1
> ? set -x
> 3) start the VM with clusvcadm -e vm:guest1
>
> It should fail as it did before.
>
> edit the the /tmp/DEBUG file and you will be able to see where it
> fails (it may generate a lot of debug)
>
> 4) remove the debug lines from /usr/share/cluster/vm.sh
>
> Post the DEBUG file if you're not able to see where it fails.
>
> Brem
>
> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>> No I am not manually starting not using automatic init scripts.
>>
>> I started the vm using: clusvcadm -e vm:guest1
>>
>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>> says guest1 started . But after a while I can see the guest1 on all
>> three nodes.
>>
>> clustat says:
>>
>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>
>> But I can see the vm from xm li.
>>
>> This is what I can see from the log:
>>
>>
>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>> returned 1 (generic error)
>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>> vm:guest1; return value: 1
>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>> recovering
>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>> service vm:guest1
>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>> returned 1 (generic error)
>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>> vm:guest1; return value: 1
>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>> recovering
>>
>>
>> Paras.
>>
>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>> using xm commands out of cluster control ?(or maybe a thru an
>>> automatic init script ?)
>>>
>>> When clustered, you should never be starting services (manually or
>>> thru automatic init script) out of cluster control
>>>
>>> The thing would be to stop your vm on all the nodes with the adequate
>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>
>>> Then see if it is started on all nodes (send clustat output)
>>>
>>>
>>>
>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>> says it is stopped.
>>>>
>>>> --
>>>> [root at cvtst1 ~]# clustat
>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>> Member Status: Quorate
>>>>
>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>> Local, rgmanager
>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>
>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>> [root at cvtst1 ~]#
>>>>
>>>>
>>>> ---
>>>> o/p of xm li on cvtst1
>>>>
>>>> --
>>>> [root at cvtst1 ~]# xm li
>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>
>>>> o/p of xm li on cvtst2
>>>>
>>>> --
>>>> [root at cvtst2 ~]# xm li
>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>> ---
>>>>
>>>> Thanks
>>>> Paras.
>>>>
>>>>
>>>>
>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>>> It looks like no.
>>>>>
>>>>> can you send an output of clustat ?of when the VM is running on
>>>>> multiple nodes at the same time?
>>>>>
>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>
>>>>>
>>>>>
>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>> properly handled by the cluster.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Paras.
>>>>>>
>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>
>>>>>>> --
>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>> <?xml version="1.0"?>
>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>> ? ? ? ?<clusternodes>
>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>> ? ? ? ?</clusternodes>
>>>>>>> ? ? ? ?<cman/>
>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>> ? ? ? ?<rm>
>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>> estart" restart_expire_time="0"/>
>>>>>>> ? ? ? ?</rm>
>>>>>>> </cluster>
>>>>>>> [root at cvtst1 cluster]#
>>>>>>> ------
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Paras.
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>>
>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>> you can post your cluster.conf?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Volker
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DEBUG
Type: application/octet-stream
Size: 3532 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090928/2ece39d0/attachment.obj>

From sunhux at gmail.com  Mon Sep 28 16:12:02 2009
From: sunhux at gmail.com (sunhux G)
Date: Tue, 29 Sep 2009 00:12:02 +0800
Subject: [Linux-cluster] 
	Memory performance guidelines : Page Out , Page In , Scan rate
Message-ID: <60f08e700909280912t1570041bve4d5588c9d6115ad@mail.gmail.com>

Hi,

This is offtopic.

Our BMC Patrol & CA Unicentre have been collecting memory statistics
( Page In / Page Out)  all these years for Windows 2003 & Unix (Solaris,
Redhat Linux & HP-UX) and senior management asked what's the industry
guideline of healthy or unhealthy values for these memory readings.

All our Unix servers (Solaris 8/9/10, HP-UX and Redhat Linux) practically
host Oracle databases & in some cases run Filenet application.

Our Windows servers are running mostly web server (IIS), act as licence
managers, filenet application, central backup servers, middleware (eg: as
Entirenet to connect users to Adabas  or  sqlnet ), file & print servers.

So based on what's hosted on the Unix (mainly Oracle db) & Win 2003
servers, would some inputs.


Q1:
Any website / inputs  will be appreciated.  I've read somewhere that average
scan rate for Unix of above 5 over 1 minute is unhealthy but there's also
remarks suggesting above 200 for scan rate to be unhealthy.  So for Oracle
database servers, what's the threshold value for "Page Out", "Page In",
"Scan
rate" & for the Win 2003 servers, what's the threshold value?

Q2:
For Unix memory, does "Scan rate" matters more than "Pg Out / In"?  I saw
one thread in a forum that suggested that "Scan rate" above 5 is unhealthy.
That thread also mentioned that for Unix,  "Page Out" value above 5 is
unhealthy while "Page In" value is immaterial.  Are these suggestions
correct?

Q3:
Anyone knows which command or how BMC Patrol & CA Unicentre obtains
the "Page Out" & "Page In" values?   po & pi (page out / page in)
from "vmstat"
do not show values that are close to what Patrol / CA Unicentre gave


Thanks
U
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090929/1d36c4c4/attachment.htm>

From alan.zg at gmail.com  Mon Sep 28 20:24:48 2009
From: alan.zg at gmail.com (Alan A)
Date: Mon, 28 Sep 2009 15:24:48 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
Message-ID: <fac531740909281324h3c381321sf89311cb3c267546@mail.gmail.com>

I ran into same issue. I had a node that was working fine and then it locked
up. When I tried to start cman - it would tell me openaix service cannot
read library since the buffer was full.

All this went away after I downgraded cman to recomended version.... Is
there a workaround, a fix?

On Mon, Sep 28, 2009 at 10:03 AM, Paras pradhan <pradhanparas at gmail.com>wrote:

> The only thing I noticed is the message after stopping the vm using xm
> in all nodes and starting using clusvcadm is
>
> "Virtual machine guest1 is blocked"
>
> The whole DEBUG file is attached.
>
>
> Thanks
> Paras.
>
> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
> > There's a problem with the script that is called by rgmanager to start
> > the VM, I don't know what causes it
> >
> > May be you should try something like :
> >
> > 1) stop the VM on all nodes with xm commands
> > 2) edit the /usr/share/cluster/vm.sh script and add the following
> > lines (after the #!/bin/bash ):
> >   exec >/tmp/DEBUG 2>&1
> >   set -x
> > 3) start the VM with clusvcadm -e vm:guest1
> >
> > It should fail as it did before.
> >
> > edit the the /tmp/DEBUG file and you will be able to see where it
> > fails (it may generate a lot of debug)
> >
> > 4) remove the debug lines from /usr/share/cluster/vm.sh
> >
> > Post the DEBUG file if you're not able to see where it fails.
> >
> > Brem
> >
> > 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
> >> No I am not manually starting not using automatic init scripts.
> >>
> >> I started the vm using: clusvcadm -e vm:guest1
> >>
> >> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
> >> says guest1 started . But after a while I can see the guest1 on all
> >> three nodes.
> >>
> >> clustat says:
> >>
> >>  Service Name                                            Owner (Last)
> >>                                          State
> >>  ------- ----                                            ----- ------
> >>                                          -----
> >>  vm:guest1                                               (none)
> >>                                          stopped
> >>
> >> But I can see the vm from xm li.
> >>
> >> This is what I can see from the log:
> >>
> >>
> >> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
> >> returned 1 (generic error)
> >> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
> >> vm:guest1; return value: 1
> >> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service
> vm:guest1
> >> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
> >> recovering
> >> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
> >> service vm:guest1
> >> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
> >> returned 1 (generic error)
> >> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
> >> vm:guest1; return value: 1
> >> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service
> vm:guest1
> >> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
> >> recovering
> >>
> >>
> >> Paras.
> >>
> >> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
> >> <brem.belguebli at gmail.com> wrote:
> >>> Have you started  your VM via rgmanager (clusvcadm -e vm:guest1) or
> >>> using xm commands out of cluster control  (or maybe a thru an
> >>> automatic init script ?)
> >>>
> >>> When clustered, you should never be starting services (manually or
> >>> thru automatic init script) out of cluster control
> >>>
> >>> The thing would be to stop your vm on all the nodes with the adequate
> >>> xm command (not using xen myself) and try to start it with clusvcadm.
> >>>
> >>> Then see if it is started on all nodes (send clustat output)
> >>>
> >>>
> >>>
> >>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
> >>>> Ok. Please see below. my vm is running on all nodes though clustat
> >>>> says it is stopped.
> >>>>
> >>>> --
> >>>> [root at cvtst1 ~]# clustat
> >>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
> >>>> Member Status: Quorate
> >>>>
> >>>>  Member Name                                                     ID
> Status
> >>>>  ------ ----                                                     ----
> ------
> >>>>  cvtst2                                                    1 Online,
> rgmanager
> >>>>  cvtst1                                                     2 Online,
> >>>> Local, rgmanager
> >>>>  cvtst3                                                     3 Online,
> rgmanager
> >>>>
> >>>>  Service Name                                            Owner (Last)
> >>>>                                          State
> >>>>  ------- ----                                            ----- ------
> >>>>                                          -----
> >>>>  vm:guest1                                               (none)
> >>>>                                          stopped
> >>>> [root at cvtst1 ~]#
> >>>>
> >>>>
> >>>> ---
> >>>> o/p of xm li on cvtst1
> >>>>
> >>>> --
> >>>> [root at cvtst1 ~]# xm li
> >>>> Name                                      ID Mem(MiB) VCPUs State
> Time(s)
> >>>> Domain-0                                   0     3470     2 r-----
>  28939.4
> >>>> guest1                                     7      511     1 -b----
> 7727.8
> >>>>
> >>>> o/p of xm li on cvtst2
> >>>>
> >>>> --
> >>>> [root at cvtst2 ~]# xm li
> >>>> Name                                      ID Mem(MiB) VCPUs State
> Time(s)
> >>>> Domain-0                                   0     3470     2 r-----
>  31558.9
> >>>> guest1                                    21      511     1 -b----
> 7558.2
> >>>> ---
> >>>>
> >>>> Thanks
> >>>> Paras.
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
> >>>> <brem.belguebli at gmail.com> wrote:
> >>>>> It looks like no.
> >>>>>
> >>>>> can you send an output of clustat  of when the VM is running on
> >>>>> multiple nodes at the same time?
> >>>>>
> >>>>> And by the way, another one after having stopped (clusvcadm -s
> vm:guest1) ?
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>> Anyone having issue as mine? Virtual machine service is not being
> >>>>>> properly handled by the cluster.
> >>>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Paras.
> >>>>>>
> >>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <
> pradhanparas at gmail.com> wrote:
> >>>>>>> Ok.. here is my cluster.conf file
> >>>>>>>
> >>>>>>> --
> >>>>>>> [root at cvtst1 cluster]# more cluster.conf
> >>>>>>> <?xml version="1.0"?>
> >>>>>>> <cluster alias="test" config_version="9" name="test">
> >>>>>>>        <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
> >>>>>>>        <clusternodes>
> >>>>>>>                <clusternode name="cvtst2" nodeid="1" votes="1">
> >>>>>>>                        <fence/>
> >>>>>>>                </clusternode>
> >>>>>>>                <clusternode name="cvtst1" nodeid="2" votes="1">
> >>>>>>>                        <fence/>
> >>>>>>>                </clusternode>
> >>>>>>>                <clusternode name="cvtst3" nodeid="3" votes="1">
> >>>>>>>                        <fence/>
> >>>>>>>                </clusternode>
> >>>>>>>        </clusternodes>
> >>>>>>>        <cman/>
> >>>>>>>        <fencedevices/>
> >>>>>>>        <rm>
> >>>>>>>                <failoverdomains>
> >>>>>>>                        <failoverdomain name="myfd1" nofailback="0"
> ordered="1" restricted="0">
> >>>>>>>                                <failoverdomainnode name="cvtst2"
> priority="3"/>
> >>>>>>>                                <failoverdomainnode name="cvtst1"
> priority="1"/>
> >>>>>>>                                <failoverdomainnode name="cvtst3"
> priority="2"/>
> >>>>>>>                        </failoverdomain>
> >>>>>>>                </failoverdomains>
> >>>>>>>                <resources/>
> >>>>>>>                <vm autostart="1" domain="myfd1" exclusive="0"
> max_restarts="0"
> >>>>>>> name="guest1" path="/vms" recovery="r
> >>>>>>> estart" restart_expire_time="0"/>
> >>>>>>>        </rm>
> >>>>>>> </cluster>
> >>>>>>> [root at cvtst1 cluster]#
> >>>>>>> ------
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> Paras.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <
> volker at ixolution.de> wrote:
> >>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
> >>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
> >>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
> >>>>>>>>> having another problem. When I start the my xen vm in one node,
> it
> >>>>>>>>> also starts on other nodes. Which daemon controls  this?
> >>>>>>>>
> >>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
> >>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
> >>>>>>>> you can post your cluster.conf?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Volker
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Linux-cluster mailing list
> >>>>>>>> Linux-cluster at redhat.com
> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Linux-cluster mailing list
> >>>>>> Linux-cluster at redhat.com
> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Linux-cluster mailing list
> >>>>> Linux-cluster at redhat.com
> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>
> >>>>
> >>>> --
> >>>> Linux-cluster mailing list
> >>>> Linux-cluster at redhat.com
> >>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090928/8e34847a/attachment.htm>

From Eric.Johnson at mtsallstream.com  Mon Sep 28 21:55:55 2009
From: Eric.Johnson at mtsallstream.com (Johnson, Eric)
Date: Mon, 28 Sep 2009 16:55:55 -0500
Subject: [Linux-cluster] Bind mounts with HA ext3
Message-ID: <CD9C931A046A4A41876F7F5E134298B4023005B3@PTEEXB02.mtsallstream.com>

I've got an RHEL 5.4 NFSv4 failover cluster that uses cluster controlled
ext3 file systems on CLVM. In order to bring all the mount points into
the NFS root, I have to use a bind mount.

Eg. Cluster mounts the file system at /fs1 then I have a script resource
that creates the bind mount to /nfsroot/fs1

However, this seems to confuse the resource manager into thinking the
file system is mounted in a location it is not configured to:

Sep 28 14:33:12 node1 clurgmgrd: [4150]: <warning> Device
/dev/nfsvg/fs1lv is mounted on /nfsroot/fs1 instead of /fs1
Sep 28 14:33:12 node1 clurgmgrd: [4150]: <warning> Device
/dev/nfsvg/fs2lv is mounted on /nfsroot/fs2 instead of /fs2
Sep 28 14:33:12 node1 clurgmgrd: [4150]: <warning> Device
/dev/nfsvg/fs3lv is mounted on /nfsroot/fs3 instead of /fs3

I can't set the real mount point into the /nfsroot directory due to an
application that expects them somewhere else on the same cluster.

This warning doesn't *seem* to impact actual operation of the cluster,
so I've just suppressed the messages through an rsyslog rule, but is
there a way to inform the cluster to ignore the bind mount? I'm using
RHEL 5.4 32-bit with all the latest updates as of today, including:

kernel-2.6.18-164.el5
cman-2.0.115-1.el5_4.2
rgmanager-2.0.52-1.el5
openais-0.80.6-8.el5

Eric



From brem.belguebli at gmail.com  Mon Sep 28 22:28:37 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 29 Sep 2009 00:28:37 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<20090920144438.GC4922@dijkstra>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
Message-ID: <29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>

Hi Paras,


I don't know the xen/cluster combination well, but if I do remember
well, I think I've read somewhere that when using xen you have to
declare the use_virsh=0 key in the VM definition in the cluster.conf.

This would make rgmanager use xm commands instead of virsh
The DEBUG output shows clearly that you are using virsh to manage your
VM instead of xm commands.
Check out the RH docs about virtualization

I'm not a 100% sure about that, I may be completely wrong.

Brem

2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
> The only thing I noticed is the message after stopping the vm using xm
> in all nodes and starting using clusvcadm is
>
> "Virtual machine guest1 is blocked"
>
> The whole DEBUG file is attached.
>
>
> Thanks
> Paras.
>
> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> There's a problem with the script that is called by rgmanager to start
>> the VM, I don't know what causes it
>>
>> May be you should try something like :
>>
>> 1) stop the VM on all nodes with xm commands
>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>> lines (after the #!/bin/bash ):
>> ? exec >/tmp/DEBUG 2>&1
>> ? set -x
>> 3) start the VM with clusvcadm -e vm:guest1
>>
>> It should fail as it did before.
>>
>> edit the the /tmp/DEBUG file and you will be able to see where it
>> fails (it may generate a lot of debug)
>>
>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>
>> Post the DEBUG file if you're not able to see where it fails.
>>
>> Brem
>>
>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>> No I am not manually starting not using automatic init scripts.
>>>
>>> I started the vm using: clusvcadm -e vm:guest1
>>>
>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>> says guest1 started . But after a while I can see the guest1 on all
>>> three nodes.
>>>
>>> clustat says:
>>>
>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>
>>> But I can see the vm from xm li.
>>>
>>> This is what I can see from the log:
>>>
>>>
>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>> returned 1 (generic error)
>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>> vm:guest1; return value: 1
>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>> recovering
>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>> service vm:guest1
>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>> returned 1 (generic error)
>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>> vm:guest1; return value: 1
>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>> recovering
>>>
>>>
>>> Paras.
>>>
>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>> automatic init script ?)
>>>>
>>>> When clustered, you should never be starting services (manually or
>>>> thru automatic init script) out of cluster control
>>>>
>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>
>>>> Then see if it is started on all nodes (send clustat output)
>>>>
>>>>
>>>>
>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>> says it is stopped.
>>>>>
>>>>> --
>>>>> [root at cvtst1 ~]# clustat
>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>> Member Status: Quorate
>>>>>
>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>> Local, rgmanager
>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>
>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>> [root at cvtst1 ~]#
>>>>>
>>>>>
>>>>> ---
>>>>> o/p of xm li on cvtst1
>>>>>
>>>>> --
>>>>> [root at cvtst1 ~]# xm li
>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>
>>>>> o/p of xm li on cvtst2
>>>>>
>>>>> --
>>>>> [root at cvtst2 ~]# xm li
>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>> ---
>>>>>
>>>>> Thanks
>>>>> Paras.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> It looks like no.
>>>>>>
>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>> multiple nodes at the same time?
>>>>>>
>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>> properly handled by the cluster.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Paras.
>>>>>>>
>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>>
>>>>>>>> --
>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>> <?xml version="1.0"?>
>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>>> ? ? ? ?<clusternodes>
>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>> ? ? ? ?</clusternodes>
>>>>>>>> ? ? ? ?<cman/>
>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>>> ? ? ? ?<rm>
>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>> ? ? ? ?</rm>
>>>>>>>> </cluster>
>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>> ------
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Paras.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>>>
>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>>> you can post your cluster.conf?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Volker
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Linux-cluster mailing list
>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Mon Sep 28 22:46:48 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 29 Sep 2009 00:46:48 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909210755x54f5a093q82bf6d948d51a7@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
Message-ID: <29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>

Paras,

Another thing, it would have been more interesting to have a start
DEBUG not a stop.

That's why I was asking you to first stop the vm manually on all your
nodes, stop eventually rgmanager on all the nodes to reset the
potential wrong states you may have, restart rgmanager.

If your VM is configured to autostart, this will make it start.

It should normally fail (as it does now). Send out your newly created
DEBUG file.

2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
> Hi Paras,
>
>
> I don't know the xen/cluster combination well, but if I do remember
> well, I think I've read somewhere that when using xen you have to
> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>
> This would make rgmanager use xm commands instead of virsh
> The DEBUG output shows clearly that you are using virsh to manage your
> VM instead of xm commands.
> Check out the RH docs about virtualization
>
> I'm not a 100% sure about that, I may be completely wrong.
>
> Brem
>
> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>> The only thing I noticed is the message after stopping the vm using xm
>> in all nodes and starting using clusvcadm is
>>
>> "Virtual machine guest1 is blocked"
>>
>> The whole DEBUG file is attached.
>>
>>
>> Thanks
>> Paras.
>>
>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> There's a problem with the script that is called by rgmanager to start
>>> the VM, I don't know what causes it
>>>
>>> May be you should try something like :
>>>
>>> 1) stop the VM on all nodes with xm commands
>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>> lines (after the #!/bin/bash ):
>>> ? exec >/tmp/DEBUG 2>&1
>>> ? set -x
>>> 3) start the VM with clusvcadm -e vm:guest1
>>>
>>> It should fail as it did before.
>>>
>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>> fails (it may generate a lot of debug)
>>>
>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>
>>> Post the DEBUG file if you're not able to see where it fails.
>>>
>>> Brem
>>>
>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>> No I am not manually starting not using automatic init scripts.
>>>>
>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>
>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>> says guest1 started . But after a while I can see the guest1 on all
>>>> three nodes.
>>>>
>>>> clustat says:
>>>>
>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>
>>>> But I can see the vm from xm li.
>>>>
>>>> This is what I can see from the log:
>>>>
>>>>
>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>> returned 1 (generic error)
>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>> vm:guest1; return value: 1
>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>> recovering
>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>> service vm:guest1
>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>> returned 1 (generic error)
>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>> vm:guest1; return value: 1
>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>> recovering
>>>>
>>>>
>>>> Paras.
>>>>
>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>> automatic init script ?)
>>>>>
>>>>> When clustered, you should never be starting services (manually or
>>>>> thru automatic init script) out of cluster control
>>>>>
>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>
>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>
>>>>>
>>>>>
>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>> says it is stopped.
>>>>>>
>>>>>> --
>>>>>> [root at cvtst1 ~]# clustat
>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>> Member Status: Quorate
>>>>>>
>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>>> Local, rgmanager
>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>>
>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>> [root at cvtst1 ~]#
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> o/p of xm li on cvtst1
>>>>>>
>>>>>> --
>>>>>> [root at cvtst1 ~]# xm li
>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>>
>>>>>> o/p of xm li on cvtst2
>>>>>>
>>>>>> --
>>>>>> [root at cvtst2 ~]# xm li
>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>>> ---
>>>>>>
>>>>>> Thanks
>>>>>> Paras.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>> It looks like no.
>>>>>>>
>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>>> multiple nodes at the same time?
>>>>>>>
>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>>> properly handled by the cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Paras.
>>>>>>>>
>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>>>> ? ? ? ?<clusternodes>
>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>> ? ? ? ?</clusternodes>
>>>>>>>>> ? ? ? ?<cman/>
>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>>>> ? ? ? ?<rm>
>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>>> ? ? ? ?</rm>
>>>>>>>>> </cluster>
>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>>> ------
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Paras.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>>>>
>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>>>> you can post your cluster.conf?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Volker
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>



From pradhanparas at gmail.com  Mon Sep 28 23:20:24 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Mon, 28 Sep 2009 18:20:24 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909251024v289b2e56oe5352cb1e071c926@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
Message-ID: <8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>

Brem,

When I try to restart rgmanager on all the nodes, this time i do not
see rgmanager running on the first node. But I do see on other 2
nodes.

Log on the first node:

Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
Manager Starting
Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
Cluster Service Manager...
Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
Manager is stopped.
Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
Manager Starting

-
It seems service is running ,  but I do not see rgmanger running using clustat


Don't know what is going on.

Thanks
Paras.


On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> Paras,
>
> Another thing, it would have been more interesting to have a start
> DEBUG not a stop.
>
> That's why I was asking you to first stop the vm manually on all your
> nodes, stop eventually rgmanager on all the nodes to reset the
> potential wrong states you may have, restart rgmanager.
>
> If your VM is configured to autostart, this will make it start.
>
> It should normally fail (as it does now). Send out your newly created
> DEBUG file.
>
> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>> Hi Paras,
>>
>>
>> I don't know the xen/cluster combination well, but if I do remember
>> well, I think I've read somewhere that when using xen you have to
>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>
>> This would make rgmanager use xm commands instead of virsh
>> The DEBUG output shows clearly that you are using virsh to manage your
>> VM instead of xm commands.
>> Check out the RH docs about virtualization
>>
>> I'm not a 100% sure about that, I may be completely wrong.
>>
>> Brem
>>
>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>> The only thing I noticed is the message after stopping the vm using xm
>>> in all nodes and starting using clusvcadm is
>>>
>>> "Virtual machine guest1 is blocked"
>>>
>>> The whole DEBUG file is attached.
>>>
>>>
>>> Thanks
>>> Paras.
>>>
>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>>> There's a problem with the script that is called by rgmanager to start
>>>> the VM, I don't know what causes it
>>>>
>>>> May be you should try something like :
>>>>
>>>> 1) stop the VM on all nodes with xm commands
>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>> lines (after the #!/bin/bash ):
>>>> ? exec >/tmp/DEBUG 2>&1
>>>> ? set -x
>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>
>>>> It should fail as it did before.
>>>>
>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>> fails (it may generate a lot of debug)
>>>>
>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>
>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>
>>>> Brem
>>>>
>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>> No I am not manually starting not using automatic init scripts.
>>>>>
>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>
>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>> three nodes.
>>>>>
>>>>> clustat says:
>>>>>
>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>
>>>>> But I can see the vm from xm li.
>>>>>
>>>>> This is what I can see from the log:
>>>>>
>>>>>
>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>> returned 1 (generic error)
>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>> vm:guest1; return value: 1
>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>> recovering
>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>> service vm:guest1
>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>> returned 1 (generic error)
>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>> vm:guest1; return value: 1
>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>> recovering
>>>>>
>>>>>
>>>>> Paras.
>>>>>
>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>>> automatic init script ?)
>>>>>>
>>>>>> When clustered, you should never be starting services (manually or
>>>>>> thru automatic init script) out of cluster control
>>>>>>
>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>>
>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>>> says it is stopped.
>>>>>>>
>>>>>>> --
>>>>>>> [root at cvtst1 ~]# clustat
>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>>> Member Status: Quorate
>>>>>>>
>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>>>> Local, rgmanager
>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>>>
>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>> [root at cvtst1 ~]#
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>> o/p of xm li on cvtst1
>>>>>>>
>>>>>>> --
>>>>>>> [root at cvtst1 ~]# xm li
>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>>>
>>>>>>> o/p of xm li on cvtst2
>>>>>>>
>>>>>>> --
>>>>>>> [root at cvtst2 ~]# xm li
>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>>>> ---
>>>>>>>
>>>>>>> Thanks
>>>>>>> Paras.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>> It looks like no.
>>>>>>>>
>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>>>> multiple nodes at the same time?
>>>>>>>>
>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>>>> properly handled by the cluster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Paras.
>>>>>>>>>
>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>>>>> ? ? ? ?<clusternodes>
>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>> ? ? ? ?</clusternodes>
>>>>>>>>>> ? ? ? ?<cman/>
>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>>>>> ? ? ? ?<rm>
>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>>>> ? ? ? ?</rm>
>>>>>>>>>> </cluster>
>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>>>> ------
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Paras.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>>>>>
>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>>>>> you can post your cluster.conf?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Volker
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Linux-cluster mailing list
>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Mon Sep 28 23:41:19 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 29 Sep 2009 01:41:19 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
Message-ID: <29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>

you mean it stopped successfully on all the nodes but it is failing to
start only on node cvtst1 ?

look at the following page  to make rgmanager more verbose. It 'll
help debug....

http://sources.redhat.com/cluster/wiki/RGManager

at Logging Configuration section




2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
> Brem,
>
> When I try to restart rgmanager on all the nodes, this time i do not
> see rgmanager running on the first node. But I do see on other 2
> nodes.
>
> Log on the first node:
>
> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
> Manager Starting
> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
> Cluster Service Manager...
> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
> Manager is stopped.
> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
> Manager Starting
>
> -
> It seems service is running , ?but I do not see rgmanger running using clustat
>
>
> Don't know what is going on.
>
> Thanks
> Paras.
>
>
> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> Paras,
>>
>> Another thing, it would have been more interesting to have a start
>> DEBUG not a stop.
>>
>> That's why I was asking you to first stop the vm manually on all your
>> nodes, stop eventually rgmanager on all the nodes to reset the
>> potential wrong states you may have, restart rgmanager.
>>
>> If your VM is configured to autostart, this will make it start.
>>
>> It should normally fail (as it does now). Send out your newly created
>> DEBUG file.
>>
>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>> Hi Paras,
>>>
>>>
>>> I don't know the xen/cluster combination well, but if I do remember
>>> well, I think I've read somewhere that when using xen you have to
>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>
>>> This would make rgmanager use xm commands instead of virsh
>>> The DEBUG output shows clearly that you are using virsh to manage your
>>> VM instead of xm commands.
>>> Check out the RH docs about virtualization
>>>
>>> I'm not a 100% sure about that, I may be completely wrong.
>>>
>>> Brem
>>>
>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>> The only thing I noticed is the message after stopping the vm using xm
>>>> in all nodes and starting using clusvcadm is
>>>>
>>>> "Virtual machine guest1 is blocked"
>>>>
>>>> The whole DEBUG file is attached.
>>>>
>>>>
>>>> Thanks
>>>> Paras.
>>>>
>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>>> There's a problem with the script that is called by rgmanager to start
>>>>> the VM, I don't know what causes it
>>>>>
>>>>> May be you should try something like :
>>>>>
>>>>> 1) stop the VM on all nodes with xm commands
>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>>> lines (after the #!/bin/bash ):
>>>>> ? exec >/tmp/DEBUG 2>&1
>>>>> ? set -x
>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>>
>>>>> It should fail as it did before.
>>>>>
>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>>> fails (it may generate a lot of debug)
>>>>>
>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>>
>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>>
>>>>> Brem
>>>>>
>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> No I am not manually starting not using automatic init scripts.
>>>>>>
>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>>
>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>>> three nodes.
>>>>>>
>>>>>> clustat says:
>>>>>>
>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>
>>>>>> But I can see the vm from xm li.
>>>>>>
>>>>>> This is what I can see from the log:
>>>>>>
>>>>>>
>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>> returned 1 (generic error)
>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>> vm:guest1; return value: 1
>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>> recovering
>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>>> service vm:guest1
>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>> returned 1 (generic error)
>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>> vm:guest1; return value: 1
>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>> recovering
>>>>>>
>>>>>>
>>>>>> Paras.
>>>>>>
>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>>>> automatic init script ?)
>>>>>>>
>>>>>>> When clustered, you should never be starting services (manually or
>>>>>>> thru automatic init script) out of cluster control
>>>>>>>
>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>>>
>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>>>> says it is stopped.
>>>>>>>>
>>>>>>>> --
>>>>>>>> [root at cvtst1 ~]# clustat
>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>>>> Member Status: Quorate
>>>>>>>>
>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>>>>> Local, rgmanager
>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>>>>
>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>>> [root at cvtst1 ~]#
>>>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> o/p of xm li on cvtst1
>>>>>>>>
>>>>>>>> --
>>>>>>>> [root at cvtst1 ~]# xm li
>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>>>>
>>>>>>>> o/p of xm li on cvtst2
>>>>>>>>
>>>>>>>> --
>>>>>>>> [root at cvtst2 ~]# xm li
>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Paras.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>> It looks like no.
>>>>>>>>>
>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>>>>> multiple nodes at the same time?
>>>>>>>>>
>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>>>>> properly handled by the cluster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Paras.
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>>>>>>>>> ? ? ? ?<cman/>
>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>>>>>> ? ? ? ?<rm>
>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>>>>> ? ? ? ?</rm>
>>>>>>>>>>> </cluster>
>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>>>>> ------
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Paras.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>>>>>>
>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>>>>>> you can post your cluster.conf?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Volker
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Linux-cluster mailing list
>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From mguazzardo76 at gmail.com  Tue Sep 29 00:48:18 2009
From: mguazzardo76 at gmail.com (Marcelo Guazzardo)
Date: Mon, 28 Sep 2009 21:48:18 -0300
Subject: [Linux-cluster] problem relocate service
Message-ID: <2b86aff20909281748g242355b6m378d75e1386ea0e@mail.gmail.com>

Hi I have a RED-HAT 5.3 and configured cluster RHCS on two nodes.
Everythings works ok,services are relocated when one of the active nodes
shutdown or reboot.
But I have a problem when I unpluded power cord of the active node. The
resources cannot be relocate, because the active node is down.

after, the other node fence the fail node, and after a few minutes, the two
nodes is up again. but, in this while, the service is unrecheable. (the
service never was relocated).
how can i do to relocate the service?.

SOrry for my english.


-- 
Marcelo Guazzardo
mguazzardo76 at gmail.com
http://mguazzardo.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090928/57c7f779/attachment.htm>

From Nitin.Choudhary at palm.com  Tue Sep 29 05:18:52 2009
From: Nitin.Choudhary at palm.com (Nitin Choudhary)
Date: Mon, 28 Sep 2009 22:18:52 -0700
Subject: [Linux-cluster] Dell iDRAC 6 Support for fencing device
In-Reply-To: <29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
Message-ID: <DD073C87D7A2E54CA9235E8382F20A5C0B44E8AB6F@ushqwmb01.palm1.palmone.com>


Hi!

It seems that iDREC6 is not supported as fencing devices.

Has anyone setup this before. Is there any workaround for this.

Thanks,

Nitin



From swhiteho at redhat.com  Tue Sep 29 08:42:06 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 29 Sep 2009 09:42:06 +0100
Subject: [Linux-cluster] Bind mounts with HA ext3
In-Reply-To: <CD9C931A046A4A41876F7F5E134298B4023005B3@PTEEXB02.mtsallstream.com>
References: <CD9C931A046A4A41876F7F5E134298B4023005B3@PTEEXB02.mtsallstream.com>
Message-ID: <1254213726.2721.2.camel@localhost.localdomain>

Hi,

On Mon, 2009-09-28 at 16:55 -0500, Johnson, Eric wrote:
> I've got an RHEL 5.4 NFSv4 failover cluster that uses cluster controlled
> ext3 file systems on CLVM. In order to bring all the mount points into
> the NFS root, I have to use a bind mount.
> 
> Eg. Cluster mounts the file system at /fs1 then I have a script resource
> that creates the bind mount to /nfsroot/fs1
> 
> However, this seems to confuse the resource manager into thinking the
> file system is mounted in a location it is not configured to:
> 
> Sep 28 14:33:12 node1 clurgmgrd: [4150]: <warning> Device
> /dev/nfsvg/fs1lv is mounted on /nfsroot/fs1 instead of /fs1
> Sep 28 14:33:12 node1 clurgmgrd: [4150]: <warning> Device
> /dev/nfsvg/fs2lv is mounted on /nfsroot/fs2 instead of /fs2
> Sep 28 14:33:12 node1 clurgmgrd: [4150]: <warning> Device
> /dev/nfsvg/fs3lv is mounted on /nfsroot/fs3 instead of /fs3
> 
> I can't set the real mount point into the /nfsroot directory due to an
> application that expects them somewhere else on the same cluster.
> 
> This warning doesn't *seem* to impact actual operation of the cluster,
> so I've just suppressed the messages through an rsyslog rule, but is
> there a way to inform the cluster to ignore the bind mount? I'm using
> RHEL 5.4 32-bit with all the latest updates as of today, including:
> 
> kernel-2.6.18-164.el5
> cman-2.0.115-1.el5_4.2
> rgmanager-2.0.52-1.el5
> openais-0.80.6-8.el5
> 
> Eric
> 
I'd suggest filing a bugzilla report. It should be able to cope with
bind mounts,

Steve.

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From robejrm at gmail.com  Tue Sep 29 08:43:33 2009
From: robejrm at gmail.com (Juan Ramon Martin Blanco)
Date: Tue, 29 Sep 2009 10:43:33 +0200
Subject: [Linux-cluster] Groupd semaphore limits
Message-ID: <8a5668960909290143m6cba8fb3h2222c148a63569fe@mail.gmail.com>

Hi folks!

Is it anywhere documented how many semaphores groupd uses for each gfs
filesystem that is mounted? Is it anywhere documented what is the
maximum number of gfs filesystems that can be mounted on a cluster?
We have had an issue in our production server with gfs filesystems not
being mounted and groupd consuming 100% cpu. Stracing it we saw it
could not create any new semaphore (with semget). After increasing
limits (modifying sysctl kernel.sem variable), everything is working
fine.
The default limit on rhel 5.3 is 128 semaphore matrix (each matrix has
3 semaphores). Mounting a gfs filesystem seems to consume 6
semaphores, so a maximum of 64 can be mounted.

I hope it can help anybody facing this problem in the future. I would
be fine if you document it somewhere.

Greetings,
Juanra



From pradhanparas at gmail.com  Tue Sep 29 15:45:37 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 29 Sep 2009 10:45:37 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
Message-ID: <8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>

Withe log_level of 3 I got only this

Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
Cluster Service Manager...
Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
Manager is stopped.
Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
Manager Starting
Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
Cluster Service Manager...
Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
Manager is stopped.
Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
Manager Starting
Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down

I do not know what the last line means.

rgmanager version I am running is:
rgmanager-2.0.52-1.el5.centos

I don't what has gone wrong.

Thanks
Paras.


On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> you mean it stopped successfully on all the nodes but it is failing to
> start only on node cvtst1 ?
>
> look at the following page ?to make rgmanager more verbose. It 'll
> help debug....
>
> http://sources.redhat.com/cluster/wiki/RGManager
>
> at Logging Configuration section
>
>
>
>
> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> Brem,
>>
>> When I try to restart rgmanager on all the nodes, this time i do not
>> see rgmanager running on the first node. But I do see on other 2
>> nodes.
>>
>> Log on the first node:
>>
>> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>> Manager Starting
>> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>> Cluster Service Manager...
>> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>> Manager is stopped.
>> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>> Manager Starting
>>
>> -
>> It seems service is running , ?but I do not see rgmanger running using clustat
>>
>>
>> Don't know what is going on.
>>
>> Thanks
>> Paras.
>>
>>
>> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> Paras,
>>>
>>> Another thing, it would have been more interesting to have a start
>>> DEBUG not a stop.
>>>
>>> That's why I was asking you to first stop the vm manually on all your
>>> nodes, stop eventually rgmanager on all the nodes to reset the
>>> potential wrong states you may have, restart rgmanager.
>>>
>>> If your VM is configured to autostart, this will make it start.
>>>
>>> It should normally fail (as it does now). Send out your newly created
>>> DEBUG file.
>>>
>>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>> Hi Paras,
>>>>
>>>>
>>>> I don't know the xen/cluster combination well, but if I do remember
>>>> well, I think I've read somewhere that when using xen you have to
>>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>>
>>>> This would make rgmanager use xm commands instead of virsh
>>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>> VM instead of xm commands.
>>>> Check out the RH docs about virtualization
>>>>
>>>> I'm not a 100% sure about that, I may be completely wrong.
>>>>
>>>> Brem
>>>>
>>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>>> The only thing I noticed is the message after stopping the vm using xm
>>>>> in all nodes and starting using clusvcadm is
>>>>>
>>>>> "Virtual machine guest1 is blocked"
>>>>>
>>>>> The whole DEBUG file is attached.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Paras.
>>>>>
>>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> There's a problem with the script that is called by rgmanager to start
>>>>>> the VM, I don't know what causes it
>>>>>>
>>>>>> May be you should try something like :
>>>>>>
>>>>>> 1) stop the VM on all nodes with xm commands
>>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>>>> lines (after the #!/bin/bash ):
>>>>>> ? exec >/tmp/DEBUG 2>&1
>>>>>> ? set -x
>>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>>>
>>>>>> It should fail as it did before.
>>>>>>
>>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>>>> fails (it may generate a lot of debug)
>>>>>>
>>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>>>
>>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>>>
>>>>>> Brem
>>>>>>
>>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> No I am not manually starting not using automatic init scripts.
>>>>>>>
>>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>>>
>>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>>>> three nodes.
>>>>>>>
>>>>>>> clustat says:
>>>>>>>
>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>>
>>>>>>> But I can see the vm from xm li.
>>>>>>>
>>>>>>> This is what I can see from the log:
>>>>>>>
>>>>>>>
>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>> returned 1 (generic error)
>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>> vm:guest1; return value: 1
>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>> recovering
>>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>>>> service vm:guest1
>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>> returned 1 (generic error)
>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>> vm:guest1; return value: 1
>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>> recovering
>>>>>>>
>>>>>>>
>>>>>>> Paras.
>>>>>>>
>>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>>>>> automatic init script ?)
>>>>>>>>
>>>>>>>> When clustered, you should never be starting services (manually or
>>>>>>>> thru automatic init script) out of cluster control
>>>>>>>>
>>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>>>>
>>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>>>>> says it is stopped.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [root at cvtst1 ~]# clustat
>>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>>>>> Member Status: Quorate
>>>>>>>>>
>>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>>>>>> Local, rgmanager
>>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>>>>>
>>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>>>> [root at cvtst1 ~]#
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> o/p of xm li on cvtst1
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [root at cvtst1 ~]# xm li
>>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>>>>>
>>>>>>>>> o/p of xm li on cvtst2
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [root at cvtst2 ~]# xm li
>>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Paras.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>>> It looks like no.
>>>>>>>>>>
>>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>>>>>> multiple nodes at the same time?
>>>>>>>>>>
>>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>>>>>> properly handled by the cluster.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Paras.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>>>>>>>>>> ? ? ? ?<cman/>
>>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>>>>>>> ? ? ? ?<rm>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>>>>>> ? ? ? ?</rm>
>>>>>>>>>>>> </cluster>
>>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>>>>>> ------
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Paras.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>>>>>>> you can post your cluster.conf?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Volker
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Linux-cluster mailing list
>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Tue Sep 29 15:46:18 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 29 Sep 2009 10:46:18 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
Message-ID: <8b711df40909290846o74ad63ebs821ee50f836a9371@mail.gmail.com>

On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> you mean it stopped successfully on all the nodes but it is failing to
> start only on node cvtst1 ?

Yes it has failed to start on cvtst1 only

Paras.

>
> look at the following page ?to make rgmanager more verbose. It 'll
> help debug....
>
> http://sources.redhat.com/cluster/wiki/RGManager
>
> at Logging Configuration section
>
>
>
>
> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> Brem,
>>
>> When I try to restart rgmanager on all the nodes, this time i do not
>> see rgmanager running on the first node. But I do see on other 2
>> nodes.
>>
>> Log on the first node:
>>
>> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>> Manager Starting
>> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>> Cluster Service Manager...
>> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>> Manager is stopped.
>> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>> Manager Starting
>>
>> -
>> It seems service is running , ?but I do not see rgmanger running using clustat
>>
>>
>> Don't know what is going on.
>>
>> Thanks
>> Paras.
>>
>>
>> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> Paras,
>>>
>>> Another thing, it would have been more interesting to have a start
>>> DEBUG not a stop.
>>>
>>> That's why I was asking you to first stop the vm manually on all your
>>> nodes, stop eventually rgmanager on all the nodes to reset the
>>> potential wrong states you may have, restart rgmanager.
>>>
>>> If your VM is configured to autostart, this will make it start.
>>>
>>> It should normally fail (as it does now). Send out your newly created
>>> DEBUG file.
>>>
>>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>> Hi Paras,
>>>>
>>>>
>>>> I don't know the xen/cluster combination well, but if I do remember
>>>> well, I think I've read somewhere that when using xen you have to
>>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>>
>>>> This would make rgmanager use xm commands instead of virsh
>>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>> VM instead of xm commands.
>>>> Check out the RH docs about virtualization
>>>>
>>>> I'm not a 100% sure about that, I may be completely wrong.
>>>>
>>>> Brem
>>>>
>>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>>> The only thing I noticed is the message after stopping the vm using xm
>>>>> in all nodes and starting using clusvcadm is
>>>>>
>>>>> "Virtual machine guest1 is blocked"
>>>>>
>>>>> The whole DEBUG file is attached.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Paras.
>>>>>
>>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> There's a problem with the script that is called by rgmanager to start
>>>>>> the VM, I don't know what causes it
>>>>>>
>>>>>> May be you should try something like :
>>>>>>
>>>>>> 1) stop the VM on all nodes with xm commands
>>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>>>> lines (after the #!/bin/bash ):
>>>>>> ? exec >/tmp/DEBUG 2>&1
>>>>>> ? set -x
>>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>>>
>>>>>> It should fail as it did before.
>>>>>>
>>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>>>> fails (it may generate a lot of debug)
>>>>>>
>>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>>>
>>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>>>
>>>>>> Brem
>>>>>>
>>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> No I am not manually starting not using automatic init scripts.
>>>>>>>
>>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>>>
>>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>>>> three nodes.
>>>>>>>
>>>>>>> clustat says:
>>>>>>>
>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>>
>>>>>>> But I can see the vm from xm li.
>>>>>>>
>>>>>>> This is what I can see from the log:
>>>>>>>
>>>>>>>
>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>> returned 1 (generic error)
>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>> vm:guest1; return value: 1
>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>> recovering
>>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>>>> service vm:guest1
>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>> returned 1 (generic error)
>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>> vm:guest1; return value: 1
>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>> recovering
>>>>>>>
>>>>>>>
>>>>>>> Paras.
>>>>>>>
>>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>>>>> automatic init script ?)
>>>>>>>>
>>>>>>>> When clustered, you should never be starting services (manually or
>>>>>>>> thru automatic init script) out of cluster control
>>>>>>>>
>>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>>>>
>>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>>>>> says it is stopped.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [root at cvtst1 ~]# clustat
>>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>>>>> Member Status: Quorate
>>>>>>>>>
>>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>>>>>> Local, rgmanager
>>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>>>>>
>>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>>>> [root at cvtst1 ~]#
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> o/p of xm li on cvtst1
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [root at cvtst1 ~]# xm li
>>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>>>>>
>>>>>>>>> o/p of xm li on cvtst2
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [root at cvtst2 ~]# xm li
>>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Paras.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>>> It looks like no.
>>>>>>>>>>
>>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>>>>>> multiple nodes at the same time?
>>>>>>>>>>
>>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>>>>>> properly handled by the cluster.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Paras.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>>>>>>>>>> ? ? ? ?<cman/>
>>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>>>>>>> ? ? ? ?<rm>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>>>>>> ? ? ? ?</rm>
>>>>>>>>>>>> </cluster>
>>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>>>>>> ------
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Paras.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>>>>>>> you can post your cluster.conf?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Volker
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Linux-cluster mailing list
>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Tue Sep 29 16:51:35 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 29 Sep 2009 18:51:35 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
Message-ID: <29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>

I use log_level=7 to have more debugging info.

It seems 4 is not enough.

Brem


2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
> Withe log_level of 3 I got only this
>
> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
> Cluster Service Manager...
> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
> Manager is stopped.
> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
> Manager Starting
> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
> Cluster Service Manager...
> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
> Manager is stopped.
> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
> Manager Starting
> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>
> I do not know what the last line means.
>
> rgmanager version I am running is:
> rgmanager-2.0.52-1.el5.centos
>
> I don't what has gone wrong.
>
> Thanks
> Paras.
>
>
> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
> > you mean it stopped successfully on all the nodes but it is failing to
> > start only on node cvtst1 ?
> >
> > look at the following page  to make rgmanager more verbose. It 'll
> > help debug....
> >
> > http://sources.redhat.com/cluster/wiki/RGManager
> >
> > at Logging Configuration section
> >
> >
> >
> >
> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
> >> Brem,
> >>
> >> When I try to restart rgmanager on all the nodes, this time i do not
> >> see rgmanager running on the first node. But I do see on other 2
> >> nodes.
> >>
> >> Log on the first node:
> >>
> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
> >> Manager Starting
> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
> >> Cluster Service Manager...
> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
> >> Manager is stopped.
> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
> >> Manager Starting
> >>
> >> -
> >> It seems service is running ,  but I do not see rgmanger running using clustat
> >>
> >>
> >> Don't know what is going on.
> >>
> >> Thanks
> >> Paras.
> >>
> >>
> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
> >> <brem.belguebli at gmail.com> wrote:
> >>> Paras,
> >>>
> >>> Another thing, it would have been more interesting to have a start
> >>> DEBUG not a stop.
> >>>
> >>> That's why I was asking you to first stop the vm manually on all your
> >>> nodes, stop eventually rgmanager on all the nodes to reset the
> >>> potential wrong states you may have, restart rgmanager.
> >>>
> >>> If your VM is configured to autostart, this will make it start.
> >>>
> >>> It should normally fail (as it does now). Send out your newly created
> >>> DEBUG file.
> >>>
> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
> >>>> Hi Paras,
> >>>>
> >>>>
> >>>> I don't know the xen/cluster combination well, but if I do remember
> >>>> well, I think I've read somewhere that when using xen you have to
> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
> >>>>
> >>>> This would make rgmanager use xm commands instead of virsh
> >>>> The DEBUG output shows clearly that you are using virsh to manage your
> >>>> VM instead of xm commands.
> >>>> Check out the RH docs about virtualization
> >>>>
> >>>> I'm not a 100% sure about that, I may be completely wrong.
> >>>>
> >>>> Brem
> >>>>
> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
> >>>>> The only thing I noticed is the message after stopping the vm using xm
> >>>>> in all nodes and starting using clusvcadm is
> >>>>>
> >>>>> "Virtual machine guest1 is blocked"
> >>>>>
> >>>>> The whole DEBUG file is attached.
> >>>>>
> >>>>>
> >>>>> Thanks
> >>>>> Paras.
> >>>>>
> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
> >>>>> <brem.belguebli at gmail.com> wrote:
> >>>>>> There's a problem with the script that is called by rgmanager to start
> >>>>>> the VM, I don't know what causes it
> >>>>>>
> >>>>>> May be you should try something like :
> >>>>>>
> >>>>>> 1) stop the VM on all nodes with xm commands
> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
> >>>>>> lines (after the #!/bin/bash ):
> >>>>>>   exec >/tmp/DEBUG 2>&1
> >>>>>>   set -x
> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
> >>>>>>
> >>>>>> It should fail as it did before.
> >>>>>>
> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
> >>>>>> fails (it may generate a lot of debug)
> >>>>>>
> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
> >>>>>>
> >>>>>> Post the DEBUG file if you're not able to see where it fails.
> >>>>>>
> >>>>>> Brem
> >>>>>>
> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>>> No I am not manually starting not using automatic init scripts.
> >>>>>>>
> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
> >>>>>>>
> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
> >>>>>>> three nodes.
> >>>>>>>
> >>>>>>> clustat says:
> >>>>>>>
> >>>>>>>  Service Name                                            Owner (Last)
> >>>>>>>                                          State
> >>>>>>>  ------- ----                                            ----- ------
> >>>>>>>                                          -----
> >>>>>>>  vm:guest1                                               (none)
> >>>>>>>                                          stopped
> >>>>>>>
> >>>>>>> But I can see the vm from xm li.
> >>>>>>>
> >>>>>>> This is what I can see from the log:
> >>>>>>>
> >>>>>>>
> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
> >>>>>>> returned 1 (generic error)
> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
> >>>>>>> vm:guest1; return value: 1
> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
> >>>>>>> recovering
> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
> >>>>>>> service vm:guest1
> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
> >>>>>>> returned 1 (generic error)
> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
> >>>>>>> vm:guest1; return value: 1
> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
> >>>>>>> recovering
> >>>>>>>
> >>>>>>>
> >>>>>>> Paras.
> >>>>>>>
> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
> >>>>>>> <brem.belguebli at gmail.com> wrote:
> >>>>>>>> Have you started  your VM via rgmanager (clusvcadm -e vm:guest1) or
> >>>>>>>> using xm commands out of cluster control  (or maybe a thru an
> >>>>>>>> automatic init script ?)
> >>>>>>>>
> >>>>>>>> When clustered, you should never be starting services (manually or
> >>>>>>>> thru automatic init script) out of cluster control
> >>>>>>>>
> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
> >>>>>>>>
> >>>>>>>> Then see if it is started on all nodes (send clustat output)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
> >>>>>>>>> says it is stopped.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> [root at cvtst1 ~]# clustat
> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
> >>>>>>>>> Member Status: Quorate
> >>>>>>>>>
> >>>>>>>>>  Member Name                                                     ID   Status
> >>>>>>>>>  ------ ----                                                     ---- ------
> >>>>>>>>>  cvtst2                                                    1 Online, rgmanager
> >>>>>>>>>  cvtst1                                                     2 Online,
> >>>>>>>>> Local, rgmanager
> >>>>>>>>>  cvtst3                                                     3 Online, rgmanager
> >>>>>>>>>
> >>>>>>>>>  Service Name                                            Owner (Last)
> >>>>>>>>>                                          State
> >>>>>>>>>  ------- ----                                            ----- ------
> >>>>>>>>>                                          -----
> >>>>>>>>>  vm:guest1                                               (none)
> >>>>>>>>>                                          stopped
> >>>>>>>>> [root at cvtst1 ~]#
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ---
> >>>>>>>>> o/p of xm li on cvtst1
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> [root at cvtst1 ~]# xm li
> >>>>>>>>> Name                                      ID Mem(MiB) VCPUs State   Time(s)
> >>>>>>>>> Domain-0                                   0     3470     2 r-----  28939.4
> >>>>>>>>> guest1                                     7      511     1 -b----   7727.8
> >>>>>>>>>
> >>>>>>>>> o/p of xm li on cvtst2
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> [root at cvtst2 ~]# xm li
> >>>>>>>>> Name                                      ID Mem(MiB) VCPUs State   Time(s)
> >>>>>>>>> Domain-0                                   0     3470     2 r-----  31558.9
> >>>>>>>>> guest1                                    21      511     1 -b----   7558.2
> >>>>>>>>> ---
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Paras.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
> >>>>>>>>>> It looks like no.
> >>>>>>>>>>
> >>>>>>>>>> can you send an output of clustat  of when the VM is running on
> >>>>>>>>>> multiple nodes at the same time?
> >>>>>>>>>>
> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
> >>>>>>>>>>> properly handled by the cluster.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> Paras.
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
> >>>>>>>>>>>> Ok.. here is my cluster.conf file
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
> >>>>>>>>>>>> <?xml version="1.0"?>
> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
> >>>>>>>>>>>>        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> >>>>>>>>>>>>        <clusternodes>
> >>>>>>>>>>>>                <clusternode name="cvtst2" nodeid="1" votes="1">
> >>>>>>>>>>>>                        <fence/>
> >>>>>>>>>>>>                </clusternode>
> >>>>>>>>>>>>                <clusternode name="cvtst1" nodeid="2" votes="1">
> >>>>>>>>>>>>                        <fence/>
> >>>>>>>>>>>>                </clusternode>
> >>>>>>>>>>>>                <clusternode name="cvtst3" nodeid="3" votes="1">
> >>>>>>>>>>>>                        <fence/>
> >>>>>>>>>>>>                </clusternode>
> >>>>>>>>>>>>        </clusternodes>
> >>>>>>>>>>>>        <cman/>
> >>>>>>>>>>>>        <fencedevices/>
> >>>>>>>>>>>>        <rm>
> >>>>>>>>>>>>                <failoverdomains>
> >>>>>>>>>>>>                        <failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
> >>>>>>>>>>>>                                <failoverdomainnode name="cvtst2" priority="3"/>
> >>>>>>>>>>>>                                <failoverdomainnode name="cvtst1" priority="1"/>
> >>>>>>>>>>>>                                <failoverdomainnode name="cvtst3" priority="2"/>
> >>>>>>>>>>>>                        </failoverdomain>
> >>>>>>>>>>>>                </failoverdomains>
> >>>>>>>>>>>>                <resources/>
> >>>>>>>>>>>>                <vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
> >>>>>>>>>>>> estart" restart_expire_time="0"/>
> >>>>>>>>>>>>        </rm>
> >>>>>>>>>>>> </cluster>
> >>>>>>>>>>>> [root at cvtst1 cluster]#
> >>>>>>>>>>>> ------
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks!
> >>>>>>>>>>>> Paras.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls  this?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
> >>>>>>>>>>>>> you can post your cluster.conf?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Volker
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Linux-cluster mailing list
> >>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Linux-cluster mailing list
> >>>>>>>> Linux-cluster at redhat.com
> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Linux-cluster mailing list
> >>>>>>> Linux-cluster at redhat.com
> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Linux-cluster mailing list
> >>>>>> Linux-cluster at redhat.com
> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Linux-cluster mailing list
> >>>>> Linux-cluster at redhat.com
> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>
> >>>>
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From rmicmirregs at gmail.com  Tue Sep 29 19:21:00 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Tue, 29 Sep 2009 21:21:00 +0200
Subject: [Linux-cluster] GFS2 vs EXT3+HA-LVM
In-Reply-To: <7d6e8da40909261748n3b4e3ba8l494e10b739063700@mail.gmail.com>
References: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>
	<20090927011033.1690d830@nb-jsosic>
	<7d6e8da40909261748n3b4e3ba8l494e10b739063700@mail.gmail.com>
Message-ID: <1254252060.6478.20.camel@mecatol>

Hi Corey,

El dom, 27-09-2009 a las 01:48 +0100, Corey Kovacs escribi?:
> clvmd is still used, basically it just makes sure the lvm changes are
> propagated to all nodes. The change is in the /etc/lvm/lvm.conf where
> locking_type=1 instead of 3 as is for GFS1/2. If I go this route,
> there will be no use of GFS at all on this cluster. locking_type=1
> along with the volume_list config options are used to ensure that no
> two nodes have the same VG mounted. 
> 
> Of course this method is new to me so my understanding of how lvm2
> works with locking_type set to one works in conjunction with clvmd
> running could be incorrect.
> 
> As always, comments are appreciated.
> 
> Corey
> 

>         
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

First, sorry for being late. I marked it as a "to read" topic, but i did
not read it until now.

>From your interest in EXT3+HA-LVM configuration, I understand you need a
high-availability solution for your service, but you don't need
concurrent access to the filesystem.

I found the same problems as you on GFS2 performance, being far away
from the results made by EXT3. I have also tested XFS filesystems in
this situations with even better performance (and now in RHEL 5.4 XFS
filesystem is introduced as an Technological Preview, so we can expect
it to be ready for mission critical usage in 5.5 or so).

I studied the HA-LVM solution but i found it "ugly" in terms of
administration. Then i chose the CLVM and tried to find a way to
guarantee access to volumes only by one node in the cluster, avoiding
administrator mistakes and mountings of non-clustered filesystems in
more than one node at the same time.

There was an "undesired behaviour" in the LVM "exclusive" flag, which
Brem submitted to the bugzilla (thanks again). If fixed, I hope a
RGMANAGER resource script I submitted could be into the project to
implement this LVM "exclusive" usage.

If you don't need the access to storage in a high availability solution
(handled by a cluster software) i encourage you to check this LVM
"exclusive" option by hand, without integrating it into RGMANAGER. For
testing purposes it should be ok. I will also recommend you to try XFS
filesystem on top of it. I can give you some instructions if you need.

If you need the access to storage in a high availability solution, you
should try the LVM resources included in RGMANAGER. Also try with XFS on
top of it. 

About the "locking_type = 1" into CLVM issue: i did not even think that
it would be possible to use it. I would expect CLVM not propagating
changes if set to 1. Have you done any tests about this? Is the
configuration working as you expected? 

Cheers,

Rafael

-- 
Rafael Mic? Miranda



From kai at fiber.net  Tue Sep 29 19:54:46 2009
From: kai at fiber.net (Kai Meyer)
Date: Tue, 29 Sep 2009 13:54:46 -0600
Subject: [Linux-cluster] GFS2 fatal: invalid metadata block
In-Reply-To: <1254129344.6052.399.camel@localhost.localdomain>
References: <4AB4BD8E.9060905@fiber.net>	<1253537628.6052.274.camel@localhost.localdomain>	<4ABD0C74.7030705@fiber.net>
	<1254129344.6052.399.camel@localhost.localdomain>
Message-ID: <4AC26606.3070401@fiber.net>

Steven Whitehouse wrote:
> Hi,
>
> You seem to have a number of issues here....
>
> On Fri, 2009-09-25 at 12:31 -0600, Kai Meyer wrote:
>   
>> Sorry for the slow response. We ended up (finally) purchasing some RHEL 
>> licenses to try and get some phone support for this problem, and came up 
>> with a plan to salvage what we could. I'll try to offer a brief history 
>> of the problem in hope you can help me understand this issue a little 
>> better.
>> I've posted the relevant logfile entries to the events described here : 
>> http://kai.gnukai.com/gfs2_meltdown.txt
>> All the nodes send syslog to a remote server named pxe, so the combined 
>> syslog for all the nodes plus the syslog server is here: 
>> http://kai.gnukai.com/messages.txt
>> We started with a 4 node cluster (nodes 1, 2, 4, 5). The GFS2 filesystem 
>> was created with the latest CentOS 5.3 had to offer when it was 
>> released. Node 3 was off at the time the errors occurred, and not part 
>> of the cluster.
>> First issue I can recover from syslog is from node 5 (192.168.100.105) 
>> on Sep 8 14:11:27 was a 'fatal: invalid metadata block' error that 
>> resulted in the file system being withdrawn.
>>     
>
> Ok. So lets start with that message. Once that message has appeared, it
> means that something on disk has been corrupted. The only way in which
> that can be fixed is to unmount on all nodes and run fsck.gfs2 on the
> filesystem. The other nodes will only carry on working until they too
> read the same erroneous block.
>
> These issues are usually very tricky to track down. The main reason for
> that is that the event which caused the corruption is usually long in
> the past before the issue is discovered. Often there has been so much
> activity that its impossible to attribute it to any particular event.
>
> That said, we are very interested to receive reports of such corruption
> in case we can figure out the common factors between such reports.
>
>   
Is there any more information I can provide that would be useful? At 
this point, I don't have the old disk array anymore. Once the data was 
recovered (as far as it was possible), the boss had me run smart checks 
on the disks, and then he re-sold them to a customer.
> The current behaviour of withdrawing a node in the event of a disk error
> is not ideal. In reality there is often little other choice though, as
> letting the node continue to operate risks possible greater corruption
> of data due to the potential for it to be working on incorrect data from
> the original problem.
>
> On recent upstream kernels we've tried to be a bit better about handling
> such errors by turning off use of individual resource groups in some
> cases, so that at least some filesystem activity can carry on.
>
>   
Is there a bug or something I can follow to see updates on this issue?

>> Next was node 4 (192.168.100.104) to hit a 'fatal: filesystem 
>> consistency error' that also resulted in the file system being 
>> withdrawn. On the systems themselves, any attempt to access the 
>> filesystem would result in a I/O error response. At the prospect of 
>> rebooting 2 of the 4 nodes in my cluster, I brought node 3 
>> (192.168.100.103) online first. Then I power cycled nodes 4 and 5 one at 
>> a time and let them come back online. These nodes are running Xen, so I 
>> start to bring the VMs that were on nodes 4 and 5 online on nodes 3-5 
>> after all 3 had joined the cluster.
>> Shortly thereafter, node 3 encounters the 'fatal: invalid metadata 
>> block', and withdraws the file system. Then node 2 (.102) encounters 
>> 'fatal: invalid metadata block' also, and withdraws the filesystem. So I 
>> reboot them.
>> During their reboot, nodes 1 (.101) and 5 hits the same 'fatal: invalid 
>> metadata block' error. I waited for nodes 2 and 3 to come back online to 
>> preserve the cluster. At this point, node 4 was the only node that still 
>> had the filesystem mounted. After I had rebooted the other 4 nodes, none 
>> of them could mount the files system after joining the cluster, and node 
>> 4 was spinning on the error:
>> Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
>> fsid=xencluster1:xenclusterfs1.0: jid=4: Trying to acquire journal lock...
>> Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
>> fsid=xencluster1:xenclusterfs1.0: jid=4: Busy
>> It wasn't until this point that we suspected the SAN. We discovered that 
>> the SAN had marked a drive as "failed" but did not remove it from the 
>> array and begin to rebuild on the hot spare. When we physically removed 
>> the failed drive, the hot spare was picked up and put into the array.
>> The VMs on node 4 were the only ones "running" but they had all crashed 
>> because their disk was unavailable. I decided to reboot all the nodes to 
>> try and re-establish the cluster. We were able to get all the VMs turned 
>> back on, and we thought we were out of the dark, with the exception of 
>> the high level of filesystem corruption we caused inside 30% of the VM's 
>> filesystems. We ran them through their ext3 filesystem checks, and got 
>> them all running again.
>>
>>     
> ext3 or gfs2? I assume you mean the latter
>
>   
I did mean ext3. The filesystems I was running fsck on were inside each 
individual VM's disk image. At this point, we had not attempted a gfs2_fsck.
>> Then at the time I send the original email, we were encountering the 
>> same invalid metadata block errors on the VMs at different points.
>>
>> With Redhat on the phone, we decided to migrate as much data as we could 
>> from the original production SAN to a new SAN, and bring the VMs online 
>> on the new SAN. There were a total of 3 VM disk images that would not 
>> copy because they would trigger the invalid metadata block error every 
>> time. After the migration, we tried 3 filesystem checks, all of which 
>> failed, leaving the fsck_dlm mechanism configured on the filesystem. We 
>> were able to override the lock with the instructions here:
>> http://kbase.redhat.com/faq/docs/DOC-17402
>>
>>     
> Was that reported as a bugzilla? fsck.gfs2 should certainly not fail int
> that way. Although, bearing in mind what you've said about bad hardware,
> that might be the reason. 
>
>   
I didn't do any reporting via bugzilla. Redhat tech support intimated 
that a bug report from CentOS servers wouldn't get much attention. 
Another reason we are very interested in moving to RHEL 5.4.
>> We were able to remount the gfs2 filesystem again, but with out any 
>> improvement. If we tried to copy those same files, it would withdraw the 
>> filesystem. We were able to mount the disk images and recover some 
>> files, and we feel like we got lucky that none of those files also 
>> triggered the filesystem withdraw.
>>
>> Today, we feel fairly confident that the underlying issue began with the 
>> disk being marked as failed, but not being removed from the array. We've 
>> contacted the hardware vendor of the SAN, but the only response they 
>> offered was, "That shouldn't have happened, and it shouldn't happen again."
>>
>>
>> I am very interested in your response, but at this point there isn't any 
>> urgency. The old production SAN is in the lab, and we are running smart 
>> tests on each of the disks to see if any of the disks are salvageable. 
>> The new SAN we put into production has all new drives (different make an 
>> model), and we hope we don't encounter any further issues.
>>
>> The last issue I want to investigate is phasing out all my CentOS 5.3 
>> servers, and installing RHEL 5.4 servers one at a time, so I don't have 
>> to take down the cluster. The intent is to live migrate all the VMs to 
>> the RHEL 5.4 servers to keep the availability on the VMs as high as 
>> possible, but that's another topic I'll probably post about later after 
>> some testing in the lab.
>>
>> -Kai Meyer
>>
>>     
> In theory that should work. In reality I'd certainly suggest testing
> carefully before attempting that. Using RHEL 5.4 is certainly a better
> option. I've no idea how many of our bug fixes CentOS has picked up,
>
> Steve.
>
>
>   
Thanks for the advice. I will make sure we do everything we can to test 
properly, and I'll likely report the results to the mailing list before 
trying to put them into production.

-Kai Meyer
>
>
>   
>> Steven Whitehouse wrote:
>>     
>>> Hi,
>>>
>>> On Sat, 2009-09-19 at 05:16 -0600, Kai Meyer wrote:
>>>   
>>>       
>>>> I have a 5 node cluster running kernel 2.6.18-128.1.6.el5xen and 
>>>> gfs2-utils-0.1.53-1.el5_3.3 . Twice in 10 days, each node in my cluster 
>>>> has failed with the same message in /var/log/messages. dmesg reports the 
>>>> same errors, and on some nodes there are no other entries previous to 
>>>> the invalid metadata block error.
>>>>
>>>> I would like to know what issues can trigger such an event. If it is 
>>>> more helpful for me to provide more information, I will be happy to, I'm 
>>>> just not sure what other information you would consider relevant.
>>>>
>>>> Thank you for your time,
>>>> -Kai Meyer
>>>>
>>>>     
>>>>         
>>> It means that the kernel was looking for an indirect block, but instead
>>> found something that was not an indirect block. The only way to fix this
>>> is with fsck (after unmounting on all nodes) otherwise the issue is
>>> likely to continue to occur each time you access the particular inode
>>> with the problem.
>>>
>>> There have been a couple of reports of this (or very similar) issues
>>> recently. The problem in each case is that the original issue probably
>>> happened some time before it triggered the message which you've seen.
>>> That means that it is very tricky to figure out exactly what the cause
>>> is.
>>>
>>> I'd be very interested to know whether this filesystem was a newly
>>> created gfs2 filesystem or an upgraded gfs1 filesystem. Also, whether
>>> there have been any other issues, however minor, which might have caused
>>> a node to be rebooted or fenced since the filesystem was created? Also,
>>> any other background information about the type of workload that was
>>> being run on the filesystem would be helpful too.
>>>
>>> Steve.
>>>
>>>
>>>   
>>>       
>>>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>>>> fsid=xencluster1:xenclusterfs1.1: fatal: invalid metadata block
>>>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>>>> fsid=xencluster1:xenclusterfs1.1:   bh = 567447963 (magic number)
>>>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>>>> fsid=xencluster1:xenclusterfs1.1:   function = 
>>>> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line
>>>> = 334
>>>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>>>> fsid=xencluster1:xenclusterfs1.1: about to withdraw this file system
>>>> Sep 19 02:02:06 192.168.100.104 kernel: GFS2: 
>>>> fsid=xencluster1:xenclusterfs1.1: telling LM to withdraw
>>>> Sep 19 02:02:07 192.168.100.104 kernel: GFS2: 
>>>> fsid=xencluster1:xenclusterfs1.1: withdrawn
>>>> Sep 19 02:02:07 192.168.100.104 kernel: 
>>>> Sep 19 02:02:07 192.168.100.104 kernel: Call Trace:
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885154ce>] 
>>>> :gfs2:gfs2_lm_withdraw+0xc1/0xd0
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262907>] 
>>>> __wait_on_bit+0x60/0x6e
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80215788>] 
>>>> sync_buffer+0x0/0x3f
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80262981>] 
>>>> out_of_line_wait_on_bit+0x6c/0x78
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8029a01a>] 
>>>> wake_bit_function+0x0/0x23
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a7f1>] 
>>>> submit_bh+0x10a/0x111
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885284a7>] 
>>>> :gfs2:gfs2_meta_check_ii+0x2c/0x38
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88518d30>] 
>>>> :gfs2:gfs2_meta_indirect_buffer+0x104/0x160
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff88509fc3>] 
>>>> :gfs2:gfs2_block_map+0x1dc/0x33e
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8021a821>] 
>>>> poll_freewait+0x29/0x6a
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a199>] 
>>>> :gfs2:gfs2_extent_map+0x74/0xac
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8850a2ce>] 
>>>> :gfs2:gfs2_write_alloc_required+0xfd/0x122
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff885128d5>] 
>>>> :gfs2:gfs2_glock_nq+0x248/0x273
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851a27c>] 
>>>> :gfs2:gfs2_write_begin+0x99/0x36a
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851bd1b>] 
>>>> :gfs2:gfs2_file_buffered_write+0x14b/0x2e5
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8020d3a5>] 
>>>> file_read_actor+0x0/0xfc
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c151>] 
>>>> :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c2f4>] 
>>>> :gfs2:gfs2_file_write_nolock+0xaa/0x10f
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022eca0>] 
>>>> __wake_up+0x38/0x4f
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80299fec>] 
>>>> autoremove_wake_function+0x0/0x2e
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8022fbe4>] 
>>>> pipe_readv+0x38e/0x3a2
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80263bce>] 
>>>> lock_kernel+0x1b/0x32
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8851c444>] 
>>>> :gfs2:gfs2_file_write+0x49/0xa7
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff80216da9>] 
>>>> vfs_write+0xce/0x174
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff802175e1>] 
>>>> sys_write+0x45/0x6e
>>>> Sep 19 02:02:07 192.168.100.104 kernel:  [<ffffffff8025f2f9>] 
>>>> tracesys+0xab/0xb6
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>     
>>>>         
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>   
>>>       
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   



From corey.kovacs at gmail.com  Tue Sep 29 20:04:38 2009
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Tue, 29 Sep 2009 21:04:38 +0100
Subject: [Linux-cluster] GFS2 vs EXT3+HA-LVM
In-Reply-To: <1254252060.6478.20.camel@mecatol>
References: <7d6e8da40909260654v5353ea6egacfaae1e4c4534ef@mail.gmail.com>
	<20090927011033.1690d830@nb-jsosic>
	<7d6e8da40909261748n3b4e3ba8l494e10b739063700@mail.gmail.com>
	<1254252060.6478.20.camel@mecatol>
Message-ID: <7d6e8da40909291304v556c500bu15c550709d133281@mail.gmail.com>

Rafael,

Thanks for the reply.  So far the set up is behaving as promised for me
although I've not tried to create any new volumes since the last time I
reboted so I'll mark that one for day after tomorrow when I am back at work.
Could very well be that I have it all wrong with respect to locking_type=1.
Actually reading the docs a bit closer indicates that you are right. I'll
check still

Thanks


-Corey





On Tue, Sep 29, 2009 at 8:21 PM, Rafael Mic? Miranda
<rmicmirregs at gmail.com>wrote:

> Hi Corey,
>
> El dom, 27-09-2009 a las 01:48 +0100, Corey Kovacs escribi?:
> > clvmd is still used, basically it just makes sure the lvm changes are
> > propagated to all nodes. The change is in the /etc/lvm/lvm.conf where
> > locking_type=1 instead of 3 as is for GFS1/2. If I go this route,
> > there will be no use of GFS at all on this cluster. locking_type=1
> > along with the volume_list config options are used to ensure that no
> > two nodes have the same VG mounted.
> >
> > Of course this method is new to me so my understanding of how lvm2
> > works with locking_type set to one works in conjunction with clvmd
> > running could be incorrect.
> >
> > As always, comments are appreciated.
> >
> > Corey
> >
>
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> First, sorry for being late. I marked it as a "to read" topic, but i did
> not read it until now.
>
> >From your interest in EXT3+HA-LVM configuration, I understand you need a
> high-availability solution for your service, but you don't need
> concurrent access to the filesystem.
>
> I found the same problems as you on GFS2 performance, being far away
> from the results made by EXT3. I have also tested XFS filesystems in
> this situations with even better performance (and now in RHEL 5.4 XFS
> filesystem is introduced as an Technological Preview, so we can expect
> it to be ready for mission critical usage in 5.5 or so).
>
> I studied the HA-LVM solution but i found it "ugly" in terms of
> administration. Then i chose the CLVM and tried to find a way to
> guarantee access to volumes only by one node in the cluster, avoiding
> administrator mistakes and mountings of non-clustered filesystems in
> more than one node at the same time.
>
> There was an "undesired behaviour" in the LVM "exclusive" flag, which
> Brem submitted to the bugzilla (thanks again). If fixed, I hope a
> RGMANAGER resource script I submitted could be into the project to
> implement this LVM "exclusive" usage.
>
> If you don't need the access to storage in a high availability solution
> (handled by a cluster software) i encourage you to check this LVM
> "exclusive" option by hand, without integrating it into RGMANAGER. For
> testing purposes it should be ok. I will also recommend you to try XFS
> filesystem on top of it. I can give you some instructions if you need.
>
> If you need the access to storage in a high availability solution, you
> should try the LVM resources included in RGMANAGER. Also try with XFS on
> top of it.
>
> About the "locking_type = 1" into CLVM issue: i did not even think that
> it would be possible to use it. I would expect CLVM not propagatingRafael
> changes if set to 1. Have you done any tests about this? Is the
> configuration working as you expected?
>
> Cheers,
>
> Rafael
>
> --
> Rafael Mic? Miranda
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090929/19666f6a/attachment.htm>

From pradhanparas at gmail.com  Tue Sep 29 20:37:03 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 29 Sep 2009 15:37:03 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
Message-ID: <8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>

Change to 7 and i got this log

Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
Cluster Service Manager...
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
Manager is stopped.
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
Manager Starting
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed


Anything unusual here?

Paras.

On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> I use log_level=7 to have more debugging info.
>
> It seems 4 is not enough.
>
> Brem
>
>
> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>> Withe log_level of 3 I got only this
>>
>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>> Cluster Service Manager...
>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>> Manager is stopped.
>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>> Manager Starting
>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>> Cluster Service Manager...
>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>> Manager is stopped.
>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>> Manager Starting
>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>
>> I do not know what the last line means.
>>
>> rgmanager version I am running is:
>> rgmanager-2.0.52-1.el5.centos
>>
>> I don't what has gone wrong.
>>
>> Thanks
>> Paras.
>>
>>
>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>> > you mean it stopped successfully on all the nodes but it is failing to
>> > start only on node cvtst1 ?
>> >
>> > look at the following page ?to make rgmanager more verbose. It 'll
>> > help debug....
>> >
>> > http://sources.redhat.com/cluster/wiki/RGManager
>> >
>> > at Logging Configuration section
>> >
>> >
>> >
>> >
>> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> >> Brem,
>> >>
>> >> When I try to restart rgmanager on all the nodes, this time i do not
>> >> see rgmanager running on the first node. But I do see on other 2
>> >> nodes.
>> >>
>> >> Log on the first node:
>> >>
>> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>> >> Manager Starting
>> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>> >> Cluster Service Manager...
>> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>> >> Manager is stopped.
>> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>> >> Manager Starting
>> >>
>> >> -
>> >> It seems service is running , ?but I do not see rgmanger running using clustat
>> >>
>> >>
>> >> Don't know what is going on.
>> >>
>> >> Thanks
>> >> Paras.
>> >>
>> >>
>> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>> >> <brem.belguebli at gmail.com> wrote:
>> >>> Paras,
>> >>>
>> >>> Another thing, it would have been more interesting to have a start
>> >>> DEBUG not a stop.
>> >>>
>> >>> That's why I was asking you to first stop the vm manually on all your
>> >>> nodes, stop eventually rgmanager on all the nodes to reset the
>> >>> potential wrong states you may have, restart rgmanager.
>> >>>
>> >>> If your VM is configured to autostart, this will make it start.
>> >>>
>> >>> It should normally fail (as it does now). Send out your newly created
>> >>> DEBUG file.
>> >>>
>> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>> >>>> Hi Paras,
>> >>>>
>> >>>>
>> >>>> I don't know the xen/cluster combination well, but if I do remember
>> >>>> well, I think I've read somewhere that when using xen you have to
>> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>> >>>>
>> >>>> This would make rgmanager use xm commands instead of virsh
>> >>>> The DEBUG output shows clearly that you are using virsh to manage your
>> >>>> VM instead of xm commands.
>> >>>> Check out the RH docs about virtualization
>> >>>>
>> >>>> I'm not a 100% sure about that, I may be completely wrong.
>> >>>>
>> >>>> Brem
>> >>>>
>> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>> The only thing I noticed is the message after stopping the vm using xm
>> >>>>> in all nodes and starting using clusvcadm is
>> >>>>>
>> >>>>> "Virtual machine guest1 is blocked"
>> >>>>>
>> >>>>> The whole DEBUG file is attached.
>> >>>>>
>> >>>>>
>> >>>>> Thanks
>> >>>>> Paras.
>> >>>>>
>> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>> >>>>> <brem.belguebli at gmail.com> wrote:
>> >>>>>> There's a problem with the script that is called by rgmanager to start
>> >>>>>> the VM, I don't know what causes it
>> >>>>>>
>> >>>>>> May be you should try something like :
>> >>>>>>
>> >>>>>> 1) stop the VM on all nodes with xm commands
>> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>> >>>>>> lines (after the #!/bin/bash ):
>> >>>>>> ? exec >/tmp/DEBUG 2>&1
>> >>>>>> ? set -x
>> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
>> >>>>>>
>> >>>>>> It should fail as it did before.
>> >>>>>>
>> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>> >>>>>> fails (it may generate a lot of debug)
>> >>>>>>
>> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>> >>>>>>
>> >>>>>> Post the DEBUG file if you're not able to see where it fails.
>> >>>>>>
>> >>>>>> Brem
>> >>>>>>
>> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>>> No I am not manually starting not using automatic init scripts.
>> >>>>>>>
>> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
>> >>>>>>>
>> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
>> >>>>>>> three nodes.
>> >>>>>>>
>> >>>>>>> clustat says:
>> >>>>>>>
>> >>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>> >>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>> >>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>> >>>>>>>
>> >>>>>>> But I can see the vm from xm li.
>> >>>>>>>
>> >>>>>>> This is what I can see from the log:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>> >>>>>>> returned 1 (generic error)
>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>> >>>>>>> vm:guest1; return value: 1
>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>> >>>>>>> recovering
>> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>> >>>>>>> service vm:guest1
>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>> >>>>>>> returned 1 (generic error)
>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>> >>>>>>> vm:guest1; return value: 1
>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>> >>>>>>> recovering
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Paras.
>> >>>>>>>
>> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>> >>>>>>> <brem.belguebli at gmail.com> wrote:
>> >>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>> >>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>> >>>>>>>> automatic init script ?)
>> >>>>>>>>
>> >>>>>>>> When clustered, you should never be starting services (manually or
>> >>>>>>>> thru automatic init script) out of cluster control
>> >>>>>>>>
>> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>> >>>>>>>>
>> >>>>>>>> Then see if it is started on all nodes (send clustat output)
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>> >>>>>>>>> says it is stopped.
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> [root at cvtst1 ~]# clustat
>> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>> >>>>>>>>> Member Status: Quorate
>> >>>>>>>>>
>> >>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>> >>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>> >>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>> >>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>> >>>>>>>>> Local, rgmanager
>> >>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>> >>>>>>>>>
>> >>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>> >>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>> >>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>> >>>>>>>>> [root at cvtst1 ~]#
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> ---
>> >>>>>>>>> o/p of xm li on cvtst1
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> [root at cvtst1 ~]# xm li
>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>> >>>>>>>>>
>> >>>>>>>>> o/p of xm li on cvtst2
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> [root at cvtst2 ~]# xm li
>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>> >>>>>>>>> ---
>> >>>>>>>>>
>> >>>>>>>>> Thanks
>> >>>>>>>>> Paras.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
>> >>>>>>>>>> It looks like no.
>> >>>>>>>>>>
>> >>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>> >>>>>>>>>> multiple nodes at the same time?
>> >>>>>>>>>>
>> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>> >>>>>>>>>>> properly handled by the cluster.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks
>> >>>>>>>>>>> Paras.
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>> >>>>>>>>>>>> Ok.. here is my cluster.conf file
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>> >>>>>>>>>>>> <?xml version="1.0"?>
>> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>> >>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>> >>>>>>>>>>>> ? ? ? ?<clusternodes>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>> >>>>>>>>>>>> ? ? ? ?</clusternodes>
>> >>>>>>>>>>>> ? ? ? ?<cman/>
>> >>>>>>>>>>>> ? ? ? ?<fencedevices/>
>> >>>>>>>>>>>> ? ? ? ?<rm>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>> >>>>>>>>>>>> estart" restart_expire_time="0"/>
>> >>>>>>>>>>>> ? ? ? ?</rm>
>> >>>>>>>>>>>> </cluster>
>> >>>>>>>>>>>> [root at cvtst1 cluster]#
>> >>>>>>>>>>>> ------
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Thanks!
>> >>>>>>>>>>>> Paras.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>> >>>>>>>>>>>>> you can post your cluster.conf?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>> Volker
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> --
>> >>>>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Linux-cluster mailing list
>> >>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Linux-cluster mailing list
>> >>>>>>> Linux-cluster at redhat.com
>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Linux-cluster mailing list
>> >>>>>> Linux-cluster at redhat.com
>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Linux-cluster mailing list
>> >>>>> Linux-cluster at redhat.com
>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Linux-cluster mailing list
>> >>> Linux-cluster at redhat.com
>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Tue Sep 29 20:44:39 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 29 Sep 2009 22:44:39 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
Message-ID: <29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>

It looks correct, rgmanager seems to start on all nodes

what gives you clustat ?

If rgmanager doesn't show, check out the logs something may have gone wrong.


2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
> Change to 7 and i got this log
>
> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
> Cluster Service Manager...
> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
> Manager is stopped.
> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
> Manager Starting
> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>
>
> Anything unusual here?
>
> Paras.
>
> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> I use log_level=7 to have more debugging info.
>>
>> It seems 4 is not enough.
>>
>> Brem
>>
>>
>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>> Withe log_level of 3 I got only this
>>>
>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>> Cluster Service Manager...
>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>> Manager is stopped.
>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>> Manager Starting
>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>> Cluster Service Manager...
>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>> Manager is stopped.
>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>> Manager Starting
>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>>
>>> I do not know what the last line means.
>>>
>>> rgmanager version I am running is:
>>> rgmanager-2.0.52-1.el5.centos
>>>
>>> I don't what has gone wrong.
>>>
>>> Thanks
>>> Paras.
>>>
>>>
>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>> > you mean it stopped successfully on all the nodes but it is failing to
>>> > start only on node cvtst1 ?
>>> >
>>> > look at the following page ?to make rgmanager more verbose. It 'll
>>> > help debug....
>>> >
>>> > http://sources.redhat.com/cluster/wiki/RGManager
>>> >
>>> > at Logging Configuration section
>>> >
>>> >
>>> >
>>> >
>>> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>> >> Brem,
>>> >>
>>> >> When I try to restart rgmanager on all the nodes, this time i do not
>>> >> see rgmanager running on the first node. But I do see on other 2
>>> >> nodes.
>>> >>
>>> >> Log on the first node:
>>> >>
>>> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>> >> Manager Starting
>>> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>> >> Cluster Service Manager...
>>> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>>> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>> >> Manager is stopped.
>>> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>> >> Manager Starting
>>> >>
>>> >> -
>>> >> It seems service is running , ?but I do not see rgmanger running using clustat
>>> >>
>>> >>
>>> >> Don't know what is going on.
>>> >>
>>> >> Thanks
>>> >> Paras.
>>> >>
>>> >>
>>> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>> >> <brem.belguebli at gmail.com> wrote:
>>> >>> Paras,
>>> >>>
>>> >>> Another thing, it would have been more interesting to have a start
>>> >>> DEBUG not a stop.
>>> >>>
>>> >>> That's why I was asking you to first stop the vm manually on all your
>>> >>> nodes, stop eventually rgmanager on all the nodes to reset the
>>> >>> potential wrong states you may have, restart rgmanager.
>>> >>>
>>> >>> If your VM is configured to autostart, this will make it start.
>>> >>>
>>> >>> It should normally fail (as it does now). Send out your newly created
>>> >>> DEBUG file.
>>> >>>
>>> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>> >>>> Hi Paras,
>>> >>>>
>>> >>>>
>>> >>>> I don't know the xen/cluster combination well, but if I do remember
>>> >>>> well, I think I've read somewhere that when using xen you have to
>>> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>> >>>>
>>> >>>> This would make rgmanager use xm commands instead of virsh
>>> >>>> The DEBUG output shows clearly that you are using virsh to manage your
>>> >>>> VM instead of xm commands.
>>> >>>> Check out the RH docs about virtualization
>>> >>>>
>>> >>>> I'm not a 100% sure about that, I may be completely wrong.
>>> >>>>
>>> >>>> Brem
>>> >>>>
>>> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>> >>>>> The only thing I noticed is the message after stopping the vm using xm
>>> >>>>> in all nodes and starting using clusvcadm is
>>> >>>>>
>>> >>>>> "Virtual machine guest1 is blocked"
>>> >>>>>
>>> >>>>> The whole DEBUG file is attached.
>>> >>>>>
>>> >>>>>
>>> >>>>> Thanks
>>> >>>>> Paras.
>>> >>>>>
>>> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>> >>>>> <brem.belguebli at gmail.com> wrote:
>>> >>>>>> There's a problem with the script that is called by rgmanager to start
>>> >>>>>> the VM, I don't know what causes it
>>> >>>>>>
>>> >>>>>> May be you should try something like :
>>> >>>>>>
>>> >>>>>> 1) stop the VM on all nodes with xm commands
>>> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>> >>>>>> lines (after the #!/bin/bash ):
>>> >>>>>> ? exec >/tmp/DEBUG 2>&1
>>> >>>>>> ? set -x
>>> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>> >>>>>>
>>> >>>>>> It should fail as it did before.
>>> >>>>>>
>>> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>> >>>>>> fails (it may generate a lot of debug)
>>> >>>>>>
>>> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>> >>>>>>
>>> >>>>>> Post the DEBUG file if you're not able to see where it fails.
>>> >>>>>>
>>> >>>>>> Brem
>>> >>>>>>
>>> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>> >>>>>>> No I am not manually starting not using automatic init scripts.
>>> >>>>>>>
>>> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>> >>>>>>>
>>> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>> >>>>>>> three nodes.
>>> >>>>>>>
>>> >>>>>>> clustat says:
>>> >>>>>>>
>>> >>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>> >>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>> >>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>> >>>>>>>
>>> >>>>>>> But I can see the vm from xm li.
>>> >>>>>>>
>>> >>>>>>> This is what I can see from the log:
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>> >>>>>>> returned 1 (generic error)
>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>> >>>>>>> vm:guest1; return value: 1
>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>> >>>>>>> recovering
>>> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>> >>>>>>> service vm:guest1
>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>> >>>>>>> returned 1 (generic error)
>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>> >>>>>>> vm:guest1; return value: 1
>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>> >>>>>>> recovering
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Paras.
>>> >>>>>>>
>>> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>> >>>>>>> <brem.belguebli at gmail.com> wrote:
>>> >>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>> >>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>> >>>>>>>> automatic init script ?)
>>> >>>>>>>>
>>> >>>>>>>> When clustered, you should never be starting services (manually or
>>> >>>>>>>> thru automatic init script) out of cluster control
>>> >>>>>>>>
>>> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>> >>>>>>>>
>>> >>>>>>>> Then see if it is started on all nodes (send clustat output)
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>> >>>>>>>>> says it is stopped.
>>> >>>>>>>>>
>>> >>>>>>>>> --
>>> >>>>>>>>> [root at cvtst1 ~]# clustat
>>> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>> >>>>>>>>> Member Status: Quorate
>>> >>>>>>>>>
>>> >>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>> >>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>> >>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>> >>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>> >>>>>>>>> Local, rgmanager
>>> >>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>> >>>>>>>>>
>>> >>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>> >>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>> >>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>> >>>>>>>>> [root at cvtst1 ~]#
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> ---
>>> >>>>>>>>> o/p of xm li on cvtst1
>>> >>>>>>>>>
>>> >>>>>>>>> --
>>> >>>>>>>>> [root at cvtst1 ~]# xm li
>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>> >>>>>>>>>
>>> >>>>>>>>> o/p of xm li on cvtst2
>>> >>>>>>>>>
>>> >>>>>>>>> --
>>> >>>>>>>>> [root at cvtst2 ~]# xm li
>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>> >>>>>>>>> ---
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks
>>> >>>>>>>>> Paras.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>> >>>>>>>>>> It looks like no.
>>> >>>>>>>>>>
>>> >>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>> >>>>>>>>>> multiple nodes at the same time?
>>> >>>>>>>>>>
>>> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>> >>>>>>>>>>> properly handled by the cluster.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks
>>> >>>>>>>>>>> Paras.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>> >>>>>>>>>>>> Ok.. here is my cluster.conf file
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> --
>>> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>> >>>>>>>>>>>> <?xml version="1.0"?>
>>> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>> >>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>> >>>>>>>>>>>> ? ? ? ?<clusternodes>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>> >>>>>>>>>>>> ? ? ? ?</clusternodes>
>>> >>>>>>>>>>>> ? ? ? ?<cman/>
>>> >>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>> >>>>>>>>>>>> ? ? ? ?<rm>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>> >>>>>>>>>>>> estart" restart_expire_time="0"/>
>>> >>>>>>>>>>>> ? ? ? ?</rm>
>>> >>>>>>>>>>>> </cluster>
>>> >>>>>>>>>>>> [root at cvtst1 cluster]#
>>> >>>>>>>>>>>> ------
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Thanks!
>>> >>>>>>>>>>>> Paras.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>> >>>>>>>>>>>>> you can post your cluster.conf?
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Regards,
>>> >>>>>>>>>>>>> Volker
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> --
>>> >>>>>>>>>>>>> Linux-cluster mailing list
>>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> --
>>> >>>>>>>>>>> Linux-cluster mailing list
>>> >>>>>>>>>>> Linux-cluster at redhat.com
>>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> --
>>> >>>>>>>>>> Linux-cluster mailing list
>>> >>>>>>>>>> Linux-cluster at redhat.com
>>> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> --
>>> >>>>>>>>> Linux-cluster mailing list
>>> >>>>>>>>> Linux-cluster at redhat.com
>>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> --
>>> >>>>>>>> Linux-cluster mailing list
>>> >>>>>>>> Linux-cluster at redhat.com
>>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> Linux-cluster mailing list
>>> >>>>>>> Linux-cluster at redhat.com
>>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> Linux-cluster mailing list
>>> >>>>>> Linux-cluster at redhat.com
>>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Linux-cluster mailing list
>>> >>>>> Linux-cluster at redhat.com
>>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>> --
>>> >>> Linux-cluster mailing list
>>> >>> Linux-cluster at redhat.com
>>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>>
>>> >>
>>> >> --
>>> >> Linux-cluster mailing list
>>> >> Linux-cluster at redhat.com
>>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >>
>>> >
>>> > --
>>> > Linux-cluster mailing list
>>> > Linux-cluster at redhat.com
>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Tue Sep 29 20:53:51 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 29 Sep 2009 15:53:51 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
Message-ID: <8b711df40909291353o2c196b9ewbfb97790c8e0549e@mail.gmail.com>

On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> It looks correct, rgmanager seems to start on all nodes
>
> what gives you clustat ?
>
> If rgmanager doesn't show, check out the logs something may have gone wrong.
>
>
> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> Change to 7 and i got this log
>>
>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>> Cluster Service Manager...
>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>> Manager is stopped.
>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>> Manager Starting
>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>>
>>
>> Anything unusual here?
>>
>> Paras.
>>
>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> I use log_level=7 to have more debugging info.
>>>
>>> It seems 4 is not enough.
>>>
>>> Brem
>>>
>>>
>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>>> Withe log_level of 3 I got only this
>>>>
>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>>> Cluster Service Manager...
>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>>> Manager is stopped.
>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>>> Manager Starting
>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>>> Cluster Service Manager...
>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>>> Manager is stopped.
>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>>> Manager Starting
>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>>>
>>>> I do not know what the last line means.
>>>>
>>>> rgmanager version I am running is:
>>>> rgmanager-2.0.52-1.el5.centos
>>>>
>>>> I don't what has gone wrong.
>>>>
>>>> Thanks
>>>> Paras.
>>>>
>>>>
>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>> > you mean it stopped successfully on all the nodes but it is failing to
>>>> > start only on node cvtst1 ?
>>>> >
>>>> > look at the following page ?to make rgmanager more verbose. It 'll
>>>> > help debug....
>>>> >
>>>> > http://sources.redhat.com/cluster/wiki/RGManager
>>>> >
>>>> > at Logging Configuration section
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>> >> Brem,
>>>> >>
>>>> >> When I try to restart rgmanager on all the nodes, this time i do not
>>>> >> see rgmanager running on the first node. But I do see on other 2
>>>> >> nodes.
>>>> >>
>>>> >> Log on the first node:
>>>> >>
>>>> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>>> >> Manager Starting
>>>> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>>> >> Cluster Service Manager...
>>>> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>>> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>>>> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>>> >> Manager is stopped.
>>>> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>>> >> Manager Starting
>>>> >>
>>>> >> -
>>>> >> It seems service is running , ?but I do not see rgmanger running using clustat
>>>> >>
>>>> >>
>>>> >> Don't know what is going on.
>>>> >>
>>>> >> Thanks
>>>> >> Paras.
>>>> >>
>>>> >>
>>>> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>>> >> <brem.belguebli at gmail.com> wrote:
>>>> >>> Paras,
>>>> >>>
>>>> >>> Another thing, it would have been more interesting to have a start
>>>> >>> DEBUG not a stop.
>>>> >>>
>>>> >>> That's why I was asking you to first stop the vm manually on all your
>>>> >>> nodes, stop eventually rgmanager on all the nodes to reset the
>>>> >>> potential wrong states you may have, restart rgmanager.
>>>> >>>
>>>> >>> If your VM is configured to autostart, this will make it start.
>>>> >>>
>>>> >>> It should normally fail (as it does now). Send out your newly created
>>>> >>> DEBUG file.
>>>> >>>
>>>> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>> >>>> Hi Paras,
>>>> >>>>
>>>> >>>>
>>>> >>>> I don't know the xen/cluster combination well, but if I do remember
>>>> >>>> well, I think I've read somewhere that when using xen you have to
>>>> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>> >>>>
>>>> >>>> This would make rgmanager use xm commands instead of virsh
>>>> >>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>> >>>> VM instead of xm commands.
>>>> >>>> Check out the RH docs about virtualization
>>>> >>>>
>>>> >>>> I'm not a 100% sure about that, I may be completely wrong.
>>>> >>>>
>>>> >>>> Brem
>>>> >>>>
>>>> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>> >>>>> The only thing I noticed is the message after stopping the vm using xm
>>>> >>>>> in all nodes and starting using clusvcadm is
>>>> >>>>>
>>>> >>>>> "Virtual machine guest1 is blocked"
>>>> >>>>>
>>>> >>>>> The whole DEBUG file is attached.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Thanks
>>>> >>>>> Paras.
>>>> >>>>>
>>>> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>> >>>>> <brem.belguebli at gmail.com> wrote:
>>>> >>>>>> There's a problem with the script that is called by rgmanager to start
>>>> >>>>>> the VM, I don't know what causes it
>>>> >>>>>>
>>>> >>>>>> May be you should try something like :
>>>> >>>>>>
>>>> >>>>>> 1) stop the VM on all nodes with xm commands
>>>> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>> >>>>>> lines (after the #!/bin/bash ):
>>>> >>>>>> ? exec >/tmp/DEBUG 2>&1
>>>> >>>>>> ? set -x
>>>> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>> >>>>>>
>>>> >>>>>> It should fail as it did before.
>>>> >>>>>>
>>>> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>> >>>>>> fails (it may generate a lot of debug)
>>>> >>>>>>
>>>> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>> >>>>>>
>>>> >>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>> >>>>>>
>>>> >>>>>> Brem
>>>> >>>>>>
>>>> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>> >>>>>>> No I am not manually starting not using automatic init scripts.
>>>> >>>>>>>
>>>> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>> >>>>>>>
>>>> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>> >>>>>>> three nodes.
>>>> >>>>>>>
>>>> >>>>>>> clustat says:
>>>> >>>>>>>
>>>> >>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>> >>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>> >>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>> >>>>>>>
>>>> >>>>>>> But I can see the vm from xm li.
>>>> >>>>>>>
>>>> >>>>>>> This is what I can see from the log:
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>> >>>>>>> returned 1 (generic error)
>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>> >>>>>>> vm:guest1; return value: 1
>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>> >>>>>>> recovering
>>>> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>> >>>>>>> service vm:guest1
>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>> >>>>>>> returned 1 (generic error)
>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>> >>>>>>> vm:guest1; return value: 1
>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>> >>>>>>> recovering
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Paras.
>>>> >>>>>>>
>>>> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>> >>>>>>> <brem.belguebli at gmail.com> wrote:
>>>> >>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>> >>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>> >>>>>>>> automatic init script ?)
>>>> >>>>>>>>
>>>> >>>>>>>> When clustered, you should never be starting services (manually or
>>>> >>>>>>>> thru automatic init script) out of cluster control
>>>> >>>>>>>>
>>>> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>> >>>>>>>>
>>>> >>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>> >>>>>>>>> says it is stopped.
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> [root at cvtst1 ~]# clustat
>>>> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>> >>>>>>>>> Member Status: Quorate
>>>> >>>>>>>>>
>>>> >>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>> >>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>> >>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>> >>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>> >>>>>>>>> Local, rgmanager
>>>> >>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>> >>>>>>>>>
>>>> >>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>> >>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>> >>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>> >>>>>>>>> [root at cvtst1 ~]#
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> ---
>>>> >>>>>>>>> o/p of xm li on cvtst1
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> [root at cvtst1 ~]# xm li
>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>> >>>>>>>>>
>>>> >>>>>>>>> o/p of xm li on cvtst2
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> [root at cvtst2 ~]# xm li
>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>> >>>>>>>>> ---
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks
>>>> >>>>>>>>> Paras.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>> >>>>>>>>>> It looks like no.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>> >>>>>>>>>> multiple nodes at the same time?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>> >>>>>>>>>>> properly handled by the cluster.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Thanks
>>>> >>>>>>>>>>> Paras.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>> >>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> --
>>>> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>> >>>>>>>>>>>> <?xml version="1.0"?>
>>>> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>> >>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>> >>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> >>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>> >>>>>>>>>>>> ? ? ? ?<cman/>
>>>> >>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>> >>>>>>>>>>>> ? ? ? ?<rm>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>> >>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>> >>>>>>>>>>>> ? ? ? ?</rm>
>>>> >>>>>>>>>>>> </cluster>
>>>> >>>>>>>>>>>> [root at cvtst1 cluster]#
>>>> >>>>>>>>>>>> ------
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Thanks!
>>>> >>>>>>>>>>>> Paras.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>> >>>>>>>>>>>>> you can post your cluster.conf?
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Regards,
>>>> >>>>>>>>>>>>> Volker
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> --
>>>> >>>>>>>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> --
>>>> >>>>>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> --
>>>> >>>>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> --
>>>> >>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> Linux-cluster mailing list
>>>> >>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> Linux-cluster mailing list
>>>> >>>>>> Linux-cluster at redhat.com
>>>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> Linux-cluster mailing list
>>>> >>>>> Linux-cluster at redhat.com
>>>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>> --
>>>> >>> Linux-cluster mailing list
>>>> >>> Linux-cluster at redhat.com
>>>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>
>>>> >>
>>>> >> --
>>>> >> Linux-cluster mailing list
>>>> >> Linux-cluster at redhat.com
>>>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>
>>>> >
>>>> > --
>>>> > Linux-cluster mailing list
>>>> > Linux-cluster at redhat.com
>>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Tue Sep 29 20:54:55 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 29 Sep 2009 15:54:55 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
Message-ID: <8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>

I don't see rgmanager .

Here is the o/p from clustat

[root at cvtst1 cluster]# clustat
Cluster Status for test @ Tue Sep 29 15:53:33 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 cvtst2                                                    1 Online
 cvtst1                                                     2 Online, Local
 cvtst3                                                     3 Online


Thanks
Paras.

On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> It looks correct, rgmanager seems to start on all nodes
>
> what gives you clustat ?
>
> If rgmanager doesn't show, check out the logs something may have gone wrong.
>
>
> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> Change to 7 and i got this log
>>
>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>> Cluster Service Manager...
>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>> Manager is stopped.
>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>> Manager Starting
>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>>
>>
>> Anything unusual here?
>>
>> Paras.
>>
>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> I use log_level=7 to have more debugging info.
>>>
>>> It seems 4 is not enough.
>>>
>>> Brem
>>>
>>>
>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>>> Withe log_level of 3 I got only this
>>>>
>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>>> Cluster Service Manager...
>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>>> Manager is stopped.
>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>>> Manager Starting
>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>>> Cluster Service Manager...
>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>>> Manager is stopped.
>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>>> Manager Starting
>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>>>
>>>> I do not know what the last line means.
>>>>
>>>> rgmanager version I am running is:
>>>> rgmanager-2.0.52-1.el5.centos
>>>>
>>>> I don't what has gone wrong.
>>>>
>>>> Thanks
>>>> Paras.
>>>>
>>>>
>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>> > you mean it stopped successfully on all the nodes but it is failing to
>>>> > start only on node cvtst1 ?
>>>> >
>>>> > look at the following page ?to make rgmanager more verbose. It 'll
>>>> > help debug....
>>>> >
>>>> > http://sources.redhat.com/cluster/wiki/RGManager
>>>> >
>>>> > at Logging Configuration section
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>> >> Brem,
>>>> >>
>>>> >> When I try to restart rgmanager on all the nodes, this time i do not
>>>> >> see rgmanager running on the first node. But I do see on other 2
>>>> >> nodes.
>>>> >>
>>>> >> Log on the first node:
>>>> >>
>>>> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>>> >> Manager Starting
>>>> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>>> >> Cluster Service Manager...
>>>> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>>> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>>>> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>>> >> Manager is stopped.
>>>> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>>> >> Manager Starting
>>>> >>
>>>> >> -
>>>> >> It seems service is running , ?but I do not see rgmanger running using clustat
>>>> >>
>>>> >>
>>>> >> Don't know what is going on.
>>>> >>
>>>> >> Thanks
>>>> >> Paras.
>>>> >>
>>>> >>
>>>> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>>> >> <brem.belguebli at gmail.com> wrote:
>>>> >>> Paras,
>>>> >>>
>>>> >>> Another thing, it would have been more interesting to have a start
>>>> >>> DEBUG not a stop.
>>>> >>>
>>>> >>> That's why I was asking you to first stop the vm manually on all your
>>>> >>> nodes, stop eventually rgmanager on all the nodes to reset the
>>>> >>> potential wrong states you may have, restart rgmanager.
>>>> >>>
>>>> >>> If your VM is configured to autostart, this will make it start.
>>>> >>>
>>>> >>> It should normally fail (as it does now). Send out your newly created
>>>> >>> DEBUG file.
>>>> >>>
>>>> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>> >>>> Hi Paras,
>>>> >>>>
>>>> >>>>
>>>> >>>> I don't know the xen/cluster combination well, but if I do remember
>>>> >>>> well, I think I've read somewhere that when using xen you have to
>>>> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>> >>>>
>>>> >>>> This would make rgmanager use xm commands instead of virsh
>>>> >>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>> >>>> VM instead of xm commands.
>>>> >>>> Check out the RH docs about virtualization
>>>> >>>>
>>>> >>>> I'm not a 100% sure about that, I may be completely wrong.
>>>> >>>>
>>>> >>>> Brem
>>>> >>>>
>>>> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>> >>>>> The only thing I noticed is the message after stopping the vm using xm
>>>> >>>>> in all nodes and starting using clusvcadm is
>>>> >>>>>
>>>> >>>>> "Virtual machine guest1 is blocked"
>>>> >>>>>
>>>> >>>>> The whole DEBUG file is attached.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Thanks
>>>> >>>>> Paras.
>>>> >>>>>
>>>> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>> >>>>> <brem.belguebli at gmail.com> wrote:
>>>> >>>>>> There's a problem with the script that is called by rgmanager to start
>>>> >>>>>> the VM, I don't know what causes it
>>>> >>>>>>
>>>> >>>>>> May be you should try something like :
>>>> >>>>>>
>>>> >>>>>> 1) stop the VM on all nodes with xm commands
>>>> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>> >>>>>> lines (after the #!/bin/bash ):
>>>> >>>>>> ? exec >/tmp/DEBUG 2>&1
>>>> >>>>>> ? set -x
>>>> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>> >>>>>>
>>>> >>>>>> It should fail as it did before.
>>>> >>>>>>
>>>> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>> >>>>>> fails (it may generate a lot of debug)
>>>> >>>>>>
>>>> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>> >>>>>>
>>>> >>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>> >>>>>>
>>>> >>>>>> Brem
>>>> >>>>>>
>>>> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>> >>>>>>> No I am not manually starting not using automatic init scripts.
>>>> >>>>>>>
>>>> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>> >>>>>>>
>>>> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>> >>>>>>> three nodes.
>>>> >>>>>>>
>>>> >>>>>>> clustat says:
>>>> >>>>>>>
>>>> >>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>> >>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>> >>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>> >>>>>>>
>>>> >>>>>>> But I can see the vm from xm li.
>>>> >>>>>>>
>>>> >>>>>>> This is what I can see from the log:
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>> >>>>>>> returned 1 (generic error)
>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>> >>>>>>> vm:guest1; return value: 1
>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>> >>>>>>> recovering
>>>> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>> >>>>>>> service vm:guest1
>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>> >>>>>>> returned 1 (generic error)
>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>> >>>>>>> vm:guest1; return value: 1
>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>> >>>>>>> recovering
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Paras.
>>>> >>>>>>>
>>>> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>> >>>>>>> <brem.belguebli at gmail.com> wrote:
>>>> >>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>> >>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>> >>>>>>>> automatic init script ?)
>>>> >>>>>>>>
>>>> >>>>>>>> When clustered, you should never be starting services (manually or
>>>> >>>>>>>> thru automatic init script) out of cluster control
>>>> >>>>>>>>
>>>> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>> >>>>>>>>
>>>> >>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>> >>>>>>>>> says it is stopped.
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> [root at cvtst1 ~]# clustat
>>>> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>> >>>>>>>>> Member Status: Quorate
>>>> >>>>>>>>>
>>>> >>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>> >>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>> >>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>> >>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>> >>>>>>>>> Local, rgmanager
>>>> >>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>> >>>>>>>>>
>>>> >>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>> >>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>> >>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>> >>>>>>>>> [root at cvtst1 ~]#
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> ---
>>>> >>>>>>>>> o/p of xm li on cvtst1
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> [root at cvtst1 ~]# xm li
>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>> >>>>>>>>>
>>>> >>>>>>>>> o/p of xm li on cvtst2
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> [root at cvtst2 ~]# xm li
>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>> >>>>>>>>> ---
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks
>>>> >>>>>>>>> Paras.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>> >>>>>>>>>> It looks like no.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>> >>>>>>>>>> multiple nodes at the same time?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>> >>>>>>>>>>> properly handled by the cluster.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Thanks
>>>> >>>>>>>>>>> Paras.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>> >>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> --
>>>> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>> >>>>>>>>>>>> <?xml version="1.0"?>
>>>> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>> >>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>> >>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>> >>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>> >>>>>>>>>>>> ? ? ? ?<cman/>
>>>> >>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>> >>>>>>>>>>>> ? ? ? ?<rm>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>> >>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>> >>>>>>>>>>>> ? ? ? ?</rm>
>>>> >>>>>>>>>>>> </cluster>
>>>> >>>>>>>>>>>> [root at cvtst1 cluster]#
>>>> >>>>>>>>>>>> ------
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Thanks!
>>>> >>>>>>>>>>>> Paras.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>> >>>>>>>>>>>>> you can post your cluster.conf?
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Regards,
>>>> >>>>>>>>>>>>> Volker
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> --
>>>> >>>>>>>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> --
>>>> >>>>>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> --
>>>> >>>>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> --
>>>> >>>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> --
>>>> >>>>>>>> Linux-cluster mailing list
>>>> >>>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> Linux-cluster mailing list
>>>> >>>>>>> Linux-cluster at redhat.com
>>>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> Linux-cluster mailing list
>>>> >>>>>> Linux-cluster at redhat.com
>>>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> Linux-cluster mailing list
>>>> >>>>> Linux-cluster at redhat.com
>>>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>> --
>>>> >>> Linux-cluster mailing list
>>>> >>> Linux-cluster at redhat.com
>>>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>>
>>>> >>
>>>> >> --
>>>> >> Linux-cluster mailing list
>>>> >> Linux-cluster at redhat.com
>>>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >>
>>>> >
>>>> > --
>>>> > Linux-cluster mailing list
>>>> > Linux-cluster at redhat.com
>>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Tue Sep 29 21:02:54 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 29 Sep 2009 23:02:54 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
Message-ID: <29ae894c0909291402v5e5ea227s431303e459a82107@mail.gmail.com>

That looks strange, maybe your cluster.conf will help

2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
> I don't see rgmanager .
>
> Here is the o/p from clustat
>
> [root at cvtst1 cluster]# clustat
> Cluster Status for test @ Tue Sep 29 15:53:33 2009
> Member Status: Quorate
>
> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online
> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online, Local
> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online
>
>
> Thanks
> Paras.
>
> On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> It looks correct, rgmanager seems to start on all nodes
>>
>> what gives you clustat ?
>>
>> If rgmanager doesn't show, check out the logs something may have gone wrong.
>>
>>
>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>> Change to 7 and i got this log
>>>
>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>>> Cluster Service Manager...
>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>>> Manager is stopped.
>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>>> Manager Starting
>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>>>
>>>
>>> Anything unusual here?
>>>
>>> Paras.
>>>
>>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>>> I use log_level=7 to have more debugging info.
>>>>
>>>> It seems 4 is not enough.
>>>>
>>>> Brem
>>>>
>>>>
>>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>>>> Withe log_level of 3 I got only this
>>>>>
>>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>>>> Cluster Service Manager...
>>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>>>> Manager is stopped.
>>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>>>> Manager Starting
>>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>>>> Cluster Service Manager...
>>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>>>> Manager is stopped.
>>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>>>> Manager Starting
>>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>>>>
>>>>> I do not know what the last line means.
>>>>>
>>>>> rgmanager version I am running is:
>>>>> rgmanager-2.0.52-1.el5.centos
>>>>>
>>>>> I don't what has gone wrong.
>>>>>
>>>>> Thanks
>>>>> Paras.
>>>>>
>>>>>
>>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>>>> <brem.belguebli at gmail.com> wrote:
>>>>> > you mean it stopped successfully on all the nodes but it is failing to
>>>>> > start only on node cvtst1 ?
>>>>> >
>>>>> > look at the following page ?to make rgmanager more verbose. It 'll
>>>>> > help debug....
>>>>> >
>>>>> > http://sources.redhat.com/cluster/wiki/RGManager
>>>>> >
>>>>> > at Logging Configuration section
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>>> >> Brem,
>>>>> >>
>>>>> >> When I try to restart rgmanager on all the nodes, this time i do not
>>>>> >> see rgmanager running on the first node. But I do see on other 2
>>>>> >> nodes.
>>>>> >>
>>>>> >> Log on the first node:
>>>>> >>
>>>>> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>>>> >> Manager Starting
>>>>> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>>>> >> Cluster Service Manager...
>>>>> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>>>> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>>>>> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>>>> >> Manager is stopped.
>>>>> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>>>> >> Manager Starting
>>>>> >>
>>>>> >> -
>>>>> >> It seems service is running , ?but I do not see rgmanger running using clustat
>>>>> >>
>>>>> >>
>>>>> >> Don't know what is going on.
>>>>> >>
>>>>> >> Thanks
>>>>> >> Paras.
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>>>> >> <brem.belguebli at gmail.com> wrote:
>>>>> >>> Paras,
>>>>> >>>
>>>>> >>> Another thing, it would have been more interesting to have a start
>>>>> >>> DEBUG not a stop.
>>>>> >>>
>>>>> >>> That's why I was asking you to first stop the vm manually on all your
>>>>> >>> nodes, stop eventually rgmanager on all the nodes to reset the
>>>>> >>> potential wrong states you may have, restart rgmanager.
>>>>> >>>
>>>>> >>> If your VM is configured to autostart, this will make it start.
>>>>> >>>
>>>>> >>> It should normally fail (as it does now). Send out your newly created
>>>>> >>> DEBUG file.
>>>>> >>>
>>>>> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>>> >>>> Hi Paras,
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I don't know the xen/cluster combination well, but if I do remember
>>>>> >>>> well, I think I've read somewhere that when using xen you have to
>>>>> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>>> >>>>
>>>>> >>>> This would make rgmanager use xm commands instead of virsh
>>>>> >>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>>> >>>> VM instead of xm commands.
>>>>> >>>> Check out the RH docs about virtualization
>>>>> >>>>
>>>>> >>>> I'm not a 100% sure about that, I may be completely wrong.
>>>>> >>>>
>>>>> >>>> Brem
>>>>> >>>>
>>>>> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>>> >>>>> The only thing I noticed is the message after stopping the vm using xm
>>>>> >>>>> in all nodes and starting using clusvcadm is
>>>>> >>>>>
>>>>> >>>>> "Virtual machine guest1 is blocked"
>>>>> >>>>>
>>>>> >>>>> The whole DEBUG file is attached.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Thanks
>>>>> >>>>> Paras.
>>>>> >>>>>
>>>>> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>>> >>>>> <brem.belguebli at gmail.com> wrote:
>>>>> >>>>>> There's a problem with the script that is called by rgmanager to start
>>>>> >>>>>> the VM, I don't know what causes it
>>>>> >>>>>>
>>>>> >>>>>> May be you should try something like :
>>>>> >>>>>>
>>>>> >>>>>> 1) stop the VM on all nodes with xm commands
>>>>> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>>> >>>>>> lines (after the #!/bin/bash ):
>>>>> >>>>>> ? exec >/tmp/DEBUG 2>&1
>>>>> >>>>>> ? set -x
>>>>> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>> >>>>>>
>>>>> >>>>>> It should fail as it did before.
>>>>> >>>>>>
>>>>> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>>> >>>>>> fails (it may generate a lot of debug)
>>>>> >>>>>>
>>>>> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>> >>>>>>
>>>>> >>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>> >>>>>>
>>>>> >>>>>> Brem
>>>>> >>>>>>
>>>>> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>> >>>>>>> No I am not manually starting not using automatic init scripts.
>>>>> >>>>>>>
>>>>> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>> >>>>>>>
>>>>> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>> >>>>>>> three nodes.
>>>>> >>>>>>>
>>>>> >>>>>>> clustat says:
>>>>> >>>>>>>
>>>>> >>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>> >>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>> >>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>> >>>>>>>
>>>>> >>>>>>> But I can see the vm from xm li.
>>>>> >>>>>>>
>>>>> >>>>>>> This is what I can see from the log:
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>> >>>>>>> returned 1 (generic error)
>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>> >>>>>>> vm:guest1; return value: 1
>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>> >>>>>>> recovering
>>>>> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>> >>>>>>> service vm:guest1
>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>> >>>>>>> returned 1 (generic error)
>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>> >>>>>>> vm:guest1; return value: 1
>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>> >>>>>>> recovering
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> Paras.
>>>>> >>>>>>>
>>>>> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>> >>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>> >>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>> >>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>> >>>>>>>> automatic init script ?)
>>>>> >>>>>>>>
>>>>> >>>>>>>> When clustered, you should never be starting services (manually or
>>>>> >>>>>>>> thru automatic init script) out of cluster control
>>>>> >>>>>>>>
>>>>> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>> >>>>>>>>> says it is stopped.
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> --
>>>>> >>>>>>>>> [root at cvtst1 ~]# clustat
>>>>> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>> >>>>>>>>> Member Status: Quorate
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>> >>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>> >>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>> >>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>> >>>>>>>>> Local, rgmanager
>>>>> >>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>> >>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>> >>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>> >>>>>>>>> [root at cvtst1 ~]#
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> ---
>>>>> >>>>>>>>> o/p of xm li on cvtst1
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> --
>>>>> >>>>>>>>> [root at cvtst1 ~]# xm li
>>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> o/p of xm li on cvtst2
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> --
>>>>> >>>>>>>>> [root at cvtst2 ~]# xm li
>>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>> >>>>>>>>> ---
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> Thanks
>>>>> >>>>>>>>> Paras.
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>> >>>>>>>>>> It looks like no.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>> >>>>>>>>>> multiple nodes at the same time?
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>> >>>>>>>>>>> properly handled by the cluster.
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>> Thanks
>>>>> >>>>>>>>>>> Paras.
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>> >>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> --
>>>>> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>> >>>>>>>>>>>> <?xml version="1.0"?>
>>>>> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>> >>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>> >>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>> >>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>>> >>>>>>>>>>>> ? ? ? ?<cman/>
>>>>> >>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>> >>>>>>>>>>>> ? ? ? ?<rm>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>> >>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>> >>>>>>>>>>>> ? ? ? ?</rm>
>>>>> >>>>>>>>>>>> </cluster>
>>>>> >>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>> >>>>>>>>>>>> ------
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> Thanks!
>>>>> >>>>>>>>>>>> Paras.
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>> >>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>> >>>>>>>>>>>>> you can post your cluster.conf?
>>>>> >>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>> Regards,
>>>>> >>>>>>>>>>>>> Volker
>>>>> >>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>> --
>>>>> >>>>>>>>>>>>> Linux-cluster mailing list
>>>>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>> --
>>>>> >>>>>>>>>>> Linux-cluster mailing list
>>>>> >>>>>>>>>>> Linux-cluster at redhat.com
>>>>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> --
>>>>> >>>>>>>>>> Linux-cluster mailing list
>>>>> >>>>>>>>>> Linux-cluster at redhat.com
>>>>> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> --
>>>>> >>>>>>>>> Linux-cluster mailing list
>>>>> >>>>>>>>> Linux-cluster at redhat.com
>>>>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>> --
>>>>> >>>>>>>> Linux-cluster mailing list
>>>>> >>>>>>>> Linux-cluster at redhat.com
>>>>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --
>>>>> >>>>>>> Linux-cluster mailing list
>>>>> >>>>>>> Linux-cluster at redhat.com
>>>>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> Linux-cluster mailing list
>>>>> >>>>>> Linux-cluster at redhat.com
>>>>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> Linux-cluster mailing list
>>>>> >>>>> Linux-cluster at redhat.com
>>>>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Linux-cluster mailing list
>>>>> >>> Linux-cluster at redhat.com
>>>>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>>
>>>>> >>
>>>>> >> --
>>>>> >> Linux-cluster mailing list
>>>>> >> Linux-cluster at redhat.com
>>>>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >>
>>>>> >
>>>>> > --
>>>>> > Linux-cluster mailing list
>>>>> > Linux-cluster at redhat.com
>>>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> >
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Tue Sep 29 21:05:13 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 29 Sep 2009 16:05:13 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909291402v5e5ea227s431303e459a82107@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
	<29ae894c0909291402v5e5ea227s431303e459a82107@mail.gmail.com>
Message-ID: <8b711df40909291405r44f36394x6def50d175932968@mail.gmail.com>

Yes strange.

cluster.conf 's entry is

<vm autostart="0" domain="myfd1" exclusive="0" max_restarts="0"
name="guest1" path="/vms" recovery="restart" restart_expire_time="0"/>


Paras.


On Tue, Sep 29, 2009 at 4:02 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> That looks strange, maybe your cluster.conf will help
>
> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> I don't see rgmanager .
>>
>> Here is the o/p from clustat
>>
>> [root at cvtst1 cluster]# clustat
>> Cluster Status for test @ Tue Sep 29 15:53:33 2009
>> Member Status: Quorate
>>
>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online
>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online, Local
>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online
>>
>>
>> Thanks
>> Paras.
>>
>> On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> It looks correct, rgmanager seems to start on all nodes
>>>
>>> what gives you clustat ?
>>>
>>> If rgmanager doesn't show, check out the logs something may have gone wrong.
>>>
>>>
>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>> Change to 7 and i got this log
>>>>
>>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>>>> Cluster Service Manager...
>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
>>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>>>> Manager is stopped.
>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>>>> Manager Starting
>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>>>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>>>>
>>>>
>>>> Anything unusual here?
>>>>
>>>> Paras.
>>>>
>>>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>>> I use log_level=7 to have more debugging info.
>>>>>
>>>>> It seems 4 is not enough.
>>>>>
>>>>> Brem
>>>>>
>>>>>
>>>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>>>>> Withe log_level of 3 I got only this
>>>>>>
>>>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>>>>> Cluster Service Manager...
>>>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>>>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>>>>> Manager is stopped.
>>>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>>>>> Manager Starting
>>>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>>>>> Cluster Service Manager...
>>>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>>>>> Manager is stopped.
>>>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>>>>> Manager Starting
>>>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>>>>>
>>>>>> I do not know what the last line means.
>>>>>>
>>>>>> rgmanager version I am running is:
>>>>>> rgmanager-2.0.52-1.el5.centos
>>>>>>
>>>>>> I don't what has gone wrong.
>>>>>>
>>>>>> Thanks
>>>>>> Paras.
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> > you mean it stopped successfully on all the nodes but it is failing to
>>>>>> > start only on node cvtst1 ?
>>>>>> >
>>>>>> > look at the following page ?to make rgmanager more verbose. It 'll
>>>>>> > help debug....
>>>>>> >
>>>>>> > http://sources.redhat.com/cluster/wiki/RGManager
>>>>>> >
>>>>>> > at Logging Configuration section
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> >> Brem,
>>>>>> >>
>>>>>> >> When I try to restart rgmanager on all the nodes, this time i do not
>>>>>> >> see rgmanager running on the first node. But I do see on other 2
>>>>>> >> nodes.
>>>>>> >>
>>>>>> >> Log on the first node:
>>>>>> >>
>>>>>> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>>>>> >> Manager Starting
>>>>>> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>>>>> >> Cluster Service Manager...
>>>>>> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>>>>> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>>>>>> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>>>>> >> Manager is stopped.
>>>>>> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>>>>> >> Manager Starting
>>>>>> >>
>>>>>> >> -
>>>>>> >> It seems service is running , ?but I do not see rgmanger running using clustat
>>>>>> >>
>>>>>> >>
>>>>>> >> Don't know what is going on.
>>>>>> >>
>>>>>> >> Thanks
>>>>>> >> Paras.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>>>>> >> <brem.belguebli at gmail.com> wrote:
>>>>>> >>> Paras,
>>>>>> >>>
>>>>>> >>> Another thing, it would have been more interesting to have a start
>>>>>> >>> DEBUG not a stop.
>>>>>> >>>
>>>>>> >>> That's why I was asking you to first stop the vm manually on all your
>>>>>> >>> nodes, stop eventually rgmanager on all the nodes to reset the
>>>>>> >>> potential wrong states you may have, restart rgmanager.
>>>>>> >>>
>>>>>> >>> If your VM is configured to autostart, this will make it start.
>>>>>> >>>
>>>>>> >>> It should normally fail (as it does now). Send out your newly created
>>>>>> >>> DEBUG file.
>>>>>> >>>
>>>>>> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>>>> >>>> Hi Paras,
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> I don't know the xen/cluster combination well, but if I do remember
>>>>>> >>>> well, I think I've read somewhere that when using xen you have to
>>>>>> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>>>> >>>>
>>>>>> >>>> This would make rgmanager use xm commands instead of virsh
>>>>>> >>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>>>> >>>> VM instead of xm commands.
>>>>>> >>>> Check out the RH docs about virtualization
>>>>>> >>>>
>>>>>> >>>> I'm not a 100% sure about that, I may be completely wrong.
>>>>>> >>>>
>>>>>> >>>> Brem
>>>>>> >>>>
>>>>>> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> >>>>> The only thing I noticed is the message after stopping the vm using xm
>>>>>> >>>>> in all nodes and starting using clusvcadm is
>>>>>> >>>>>
>>>>>> >>>>> "Virtual machine guest1 is blocked"
>>>>>> >>>>>
>>>>>> >>>>> The whole DEBUG file is attached.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> Thanks
>>>>>> >>>>> Paras.
>>>>>> >>>>>
>>>>>> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>>>> >>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> >>>>>> There's a problem with the script that is called by rgmanager to start
>>>>>> >>>>>> the VM, I don't know what causes it
>>>>>> >>>>>>
>>>>>> >>>>>> May be you should try something like :
>>>>>> >>>>>>
>>>>>> >>>>>> 1) stop the VM on all nodes with xm commands
>>>>>> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>>>> >>>>>> lines (after the #!/bin/bash ):
>>>>>> >>>>>> ? exec >/tmp/DEBUG 2>&1
>>>>>> >>>>>> ? set -x
>>>>>> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>>> >>>>>>
>>>>>> >>>>>> It should fail as it did before.
>>>>>> >>>>>>
>>>>>> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>>>> >>>>>> fails (it may generate a lot of debug)
>>>>>> >>>>>>
>>>>>> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>>> >>>>>>
>>>>>> >>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>>> >>>>>>
>>>>>> >>>>>> Brem
>>>>>> >>>>>>
>>>>>> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> >>>>>>> No I am not manually starting not using automatic init scripts.
>>>>>> >>>>>>>
>>>>>> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>> >>>>>>>
>>>>>> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>>> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>>> >>>>>>> three nodes.
>>>>>> >>>>>>>
>>>>>> >>>>>>> clustat says:
>>>>>> >>>>>>>
>>>>>> >>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>> >>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>> >>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>> >>>>>>>
>>>>>> >>>>>>> But I can see the vm from xm li.
>>>>>> >>>>>>>
>>>>>> >>>>>>> This is what I can see from the log:
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>> >>>>>>> returned 1 (generic error)
>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>> >>>>>>> vm:guest1; return value: 1
>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>> >>>>>>> recovering
>>>>>> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>>> >>>>>>> service vm:guest1
>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>> >>>>>>> returned 1 (generic error)
>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>> >>>>>>> vm:guest1; return value: 1
>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>> >>>>>>> recovering
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> Paras.
>>>>>> >>>>>>>
>>>>>> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>>> >>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> >>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>>> >>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>>> >>>>>>>> automatic init script ?)
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> When clustered, you should never be starting services (manually or
>>>>>> >>>>>>>> thru automatic init script) out of cluster control
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>>> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>> >>>>>>>>> says it is stopped.
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> --
>>>>>> >>>>>>>>> [root at cvtst1 ~]# clustat
>>>>>> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>> >>>>>>>>> Member Status: Quorate
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>>> >>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>>> >>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>>> >>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>>> >>>>>>>>> Local, rgmanager
>>>>>> >>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>> >>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>> >>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>> >>>>>>>>> [root at cvtst1 ~]#
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> ---
>>>>>> >>>>>>>>> o/p of xm li on cvtst1
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> --
>>>>>> >>>>>>>>> [root at cvtst1 ~]# xm li
>>>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> o/p of xm li on cvtst2
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> --
>>>>>> >>>>>>>>> [root at cvtst2 ~]# xm li
>>>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>>> >>>>>>>>> ---
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> Thanks
>>>>>> >>>>>>>>> Paras.
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> >>>>>>>>>> It looks like no.
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>> >>>>>>>>>> multiple nodes at the same time?
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>> >>>>>>>>>>> properly handled by the cluster.
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> Thanks
>>>>>> >>>>>>>>>>> Paras.
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>> >>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> --
>>>>>> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>> >>>>>>>>>>>> <?xml version="1.0"?>
>>>>>> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>> >>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>> >>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>> >>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>>>> >>>>>>>>>>>> ? ? ? ?<cman/>
>>>>>> >>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>> >>>>>>>>>>>> ? ? ? ?<rm>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>> >>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>> >>>>>>>>>>>> ? ? ? ?</rm>
>>>>>> >>>>>>>>>>>> </cluster>
>>>>>> >>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>> >>>>>>>>>>>> ------
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> Thanks!
>>>>>> >>>>>>>>>>>> Paras.
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>> >>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>> >>>>>>>>>>>>> you can post your cluster.conf?
>>>>>> >>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>> Regards,
>>>>>> >>>>>>>>>>>>> Volker
>>>>>> >>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>> --
>>>>>> >>>>>>>>>>>>> Linux-cluster mailing list
>>>>>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>>>>>>>>>>>
>>>>>> >>>>>>>>>>>>
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>> --
>>>>>> >>>>>>>>>>> Linux-cluster mailing list
>>>>>> >>>>>>>>>>> Linux-cluster at redhat.com
>>>>>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>>>>>>>>>
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>> --
>>>>>> >>>>>>>>>> Linux-cluster mailing list
>>>>>> >>>>>>>>>> Linux-cluster at redhat.com
>>>>>> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>>>>>>>>
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>> --
>>>>>> >>>>>>>>> Linux-cluster mailing list
>>>>>> >>>>>>>>> Linux-cluster at redhat.com
>>>>>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>>>>>>>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> --
>>>>>> >>>>>>>> Linux-cluster mailing list
>>>>>> >>>>>>>> Linux-cluster at redhat.com
>>>>>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> --
>>>>>> >>>>>>> Linux-cluster mailing list
>>>>>> >>>>>>> Linux-cluster at redhat.com
>>>>>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> --
>>>>>> >>>>>> Linux-cluster mailing list
>>>>>> >>>>>> Linux-cluster at redhat.com
>>>>>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>> Linux-cluster mailing list
>>>>>> >>>>> Linux-cluster at redhat.com
>>>>>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Linux-cluster mailing list
>>>>>> >>> Linux-cluster at redhat.com
>>>>>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Linux-cluster mailing list
>>>>>> >> Linux-cluster at redhat.com
>>>>>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >>
>>>>>> >
>>>>>> > --
>>>>>> > Linux-cluster mailing list
>>>>>> > Linux-cluster at redhat.com
>>>>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> >
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Tue Sep 29 21:08:55 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 29 Sep 2009 23:08:55 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909291405r44f36394x6def50d175932968@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
	<29ae894c0909291402v5e5ea227s431303e459a82107@mail.gmail.com>
	<8b711df40909291405r44f36394x6def50d175932968@mail.gmail.com>
Message-ID: <29ae894c0909291408j1fa2703ci31cba472acea04ec@mail.gmail.com>

the whole cluster.conf not only the vm entry.

The problem seems to be rgmanager,

2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
> Yes strange.
>
> cluster.conf 's entry is
>
> <vm autostart="0" domain="myfd1" exclusive="0" max_restarts="0"
> name="guest1" path="/vms" recovery="restart" restart_expire_time="0"/>
>
>
> Paras.
>
>
> On Tue, Sep 29, 2009 at 4:02 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> That looks strange, maybe your cluster.conf will help
>>
>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>> I don't see rgmanager .
>>>
>>> Here is the o/p from clustat
>>>
>>> [root at cvtst1 cluster]# clustat
>>> Cluster Status for test @ Tue Sep 29 15:53:33 2009
>>> Member Status: Quorate
>>>
>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online
>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online, Local
>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online
>>>
>>>
>>> Thanks
>>> Paras.
>>>
>>> On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>>> It looks correct, rgmanager seems to start on all nodes
>>>>
>>>> what gives you clustat ?
>>>>
>>>> If rgmanager doesn't show, check out the logs something may have gone wrong.
>>>>
>>>>
>>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>>> Change to 7 and i got this log
>>>>>
>>>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>>>>> Cluster Service Manager...
>>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
>>>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>>>>> Manager is stopped.
>>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>>>>> Manager Starting
>>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>>>>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>>>>>
>>>>>
>>>>> Anything unusual here?
>>>>>
>>>>> Paras.
>>>>>
>>>>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>> I use log_level=7 to have more debugging info.
>>>>>>
>>>>>> It seems 4 is not enough.
>>>>>>
>>>>>> Brem
>>>>>>
>>>>>>
>>>>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> Withe log_level of 3 I got only this
>>>>>>>
>>>>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>>>>>> Cluster Service Manager...
>>>>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>>>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>>>>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>>>>>> Manager is stopped.
>>>>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>>>>>> Manager Starting
>>>>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>>>>>> Cluster Service Manager...
>>>>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>>>>>> Manager is stopped.
>>>>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>>>>>> Manager Starting
>>>>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>>>>>>
>>>>>>> I do not know what the last line means.
>>>>>>>
>>>>>>> rgmanager version I am running is:
>>>>>>> rgmanager-2.0.52-1.el5.centos
>>>>>>>
>>>>>>> I don't what has gone wrong.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Paras.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>> > you mean it stopped successfully on all the nodes but it is failing to
>>>>>>> > start only on node cvtst1 ?
>>>>>>> >
>>>>>>> > look at the following page ?to make rgmanager more verbose. It 'll
>>>>>>> > help debug....
>>>>>>> >
>>>>>>> > http://sources.redhat.com/cluster/wiki/RGManager
>>>>>>> >
>>>>>>> > at Logging Configuration section
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> >> Brem,
>>>>>>> >>
>>>>>>> >> When I try to restart rgmanager on all the nodes, this time i do not
>>>>>>> >> see rgmanager running on the first node. But I do see on other 2
>>>>>>> >> nodes.
>>>>>>> >>
>>>>>>> >> Log on the first node:
>>>>>>> >>
>>>>>>> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>>>>>> >> Manager Starting
>>>>>>> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>>>>>> >> Cluster Service Manager...
>>>>>>> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>>>>>> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>>>>>>> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>>>>>> >> Manager is stopped.
>>>>>>> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>>>>>> >> Manager Starting
>>>>>>> >>
>>>>>>> >> -
>>>>>>> >> It seems service is running , ?but I do not see rgmanger running using clustat
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> Don't know what is going on.
>>>>>>> >>
>>>>>>> >> Thanks
>>>>>>> >> Paras.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>>>>>> >> <brem.belguebli at gmail.com> wrote:
>>>>>>> >>> Paras,
>>>>>>> >>>
>>>>>>> >>> Another thing, it would have been more interesting to have a start
>>>>>>> >>> DEBUG not a stop.
>>>>>>> >>>
>>>>>>> >>> That's why I was asking you to first stop the vm manually on all your
>>>>>>> >>> nodes, stop eventually rgmanager on all the nodes to reset the
>>>>>>> >>> potential wrong states you may have, restart rgmanager.
>>>>>>> >>>
>>>>>>> >>> If your VM is configured to autostart, this will make it start.
>>>>>>> >>>
>>>>>>> >>> It should normally fail (as it does now). Send out your newly created
>>>>>>> >>> DEBUG file.
>>>>>>> >>>
>>>>>>> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>>>>> >>>> Hi Paras,
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> I don't know the xen/cluster combination well, but if I do remember
>>>>>>> >>>> well, I think I've read somewhere that when using xen you have to
>>>>>>> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>>>>> >>>>
>>>>>>> >>>> This would make rgmanager use xm commands instead of virsh
>>>>>>> >>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>>>>> >>>> VM instead of xm commands.
>>>>>>> >>>> Check out the RH docs about virtualization
>>>>>>> >>>>
>>>>>>> >>>> I'm not a 100% sure about that, I may be completely wrong.
>>>>>>> >>>>
>>>>>>> >>>> Brem
>>>>>>> >>>>
>>>>>>> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> >>>>> The only thing I noticed is the message after stopping the vm using xm
>>>>>>> >>>>> in all nodes and starting using clusvcadm is
>>>>>>> >>>>>
>>>>>>> >>>>> "Virtual machine guest1 is blocked"
>>>>>>> >>>>>
>>>>>>> >>>>> The whole DEBUG file is attached.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> Thanks
>>>>>>> >>>>> Paras.
>>>>>>> >>>>>
>>>>>>> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>>>>> >>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>> >>>>>> There's a problem with the script that is called by rgmanager to start
>>>>>>> >>>>>> the VM, I don't know what causes it
>>>>>>> >>>>>>
>>>>>>> >>>>>> May be you should try something like :
>>>>>>> >>>>>>
>>>>>>> >>>>>> 1) stop the VM on all nodes with xm commands
>>>>>>> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>>>>> >>>>>> lines (after the #!/bin/bash ):
>>>>>>> >>>>>> ? exec >/tmp/DEBUG 2>&1
>>>>>>> >>>>>> ? set -x
>>>>>>> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>>>> >>>>>>
>>>>>>> >>>>>> It should fail as it did before.
>>>>>>> >>>>>>
>>>>>>> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>>>>> >>>>>> fails (it may generate a lot of debug)
>>>>>>> >>>>>>
>>>>>>> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>>>> >>>>>>
>>>>>>> >>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>>>> >>>>>>
>>>>>>> >>>>>> Brem
>>>>>>> >>>>>>
>>>>>>> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> >>>>>>> No I am not manually starting not using automatic init scripts.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>>>> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>>>> >>>>>>> three nodes.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> clustat says:
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>> >>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>> >>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> But I can see the vm from xm li.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> This is what I can see from the log:
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>> >>>>>>> returned 1 (generic error)
>>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>> >>>>>>> vm:guest1; return value: 1
>>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>> >>>>>>> recovering
>>>>>>> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>>>> >>>>>>> service vm:guest1
>>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>> >>>>>>> returned 1 (generic error)
>>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>> >>>>>>> vm:guest1; return value: 1
>>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>> >>>>>>> recovering
>>>>>>> >>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Paras.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>>>> >>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>> >>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>>>> >>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>>>> >>>>>>>> automatic init script ?)
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> When clustered, you should never be starting services (manually or
>>>>>>> >>>>>>>> thru automatic init script) out of cluster control
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>>>> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>>> >>>>>>>>> says it is stopped.
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> --
>>>>>>> >>>>>>>>> [root at cvtst1 ~]# clustat
>>>>>>> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>>> >>>>>>>>> Member Status: Quorate
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>>>> >>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>>>> >>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>>>> >>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>>>> >>>>>>>>> Local, rgmanager
>>>>>>> >>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>> >>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>> >>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>> >>>>>>>>> [root at cvtst1 ~]#
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> ---
>>>>>>> >>>>>>>>> o/p of xm li on cvtst1
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> --
>>>>>>> >>>>>>>>> [root at cvtst1 ~]# xm li
>>>>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> o/p of xm li on cvtst2
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> --
>>>>>>> >>>>>>>>> [root at cvtst2 ~]# xm li
>>>>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>>>> >>>>>>>>> ---
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> Thanks
>>>>>>> >>>>>>>>> Paras.
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>>> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>> >>>>>>>>>> It looks like no.
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>>> >>>>>>>>>> multiple nodes at the same time?
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>> >>>>>>>>>>> properly handled by the cluster.
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>>>>> Thanks
>>>>>>> >>>>>>>>>>> Paras.
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>> >>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>> >>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>> --
>>>>>>> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>> >>>>>>>>>>>> <?xml version="1.0"?>
>>>>>>> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>> >>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>> >>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>> >>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>>>>> >>>>>>>>>>>> ? ? ? ?<cman/>
>>>>>>> >>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>> >>>>>>>>>>>> ? ? ? ?<rm>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>> >>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>> >>>>>>>>>>>> ? ? ? ?</rm>
>>>>>>> >>>>>>>>>>>> </cluster>
>>>>>>> >>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>> >>>>>>>>>>>> ------
>>>>>>> >>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>> Thanks!
>>>>>>> >>>>>>>>>>>> Paras.
>>>>>>> >>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>> >>>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>> >>>>>>>>>>>>> you can post your cluster.conf?
>>>>>>> >>>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>>> Regards,
>>>>>>> >>>>>>>>>>>>> Volker
>>>>>>> >>>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>>> --
>>>>>>> >>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>>
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>>>>> --
>>>>>>> >>>>>>>>>>> Linux-cluster mailing list
>>>>>>> >>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>>>>>>>>>
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>> --
>>>>>>> >>>>>>>>>> Linux-cluster mailing list
>>>>>>> >>>>>>>>>> Linux-cluster at redhat.com
>>>>>>> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>>>>>>>>
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>> --
>>>>>>> >>>>>>>>> Linux-cluster mailing list
>>>>>>> >>>>>>>>> Linux-cluster at redhat.com
>>>>>>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>>>>>>>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> --
>>>>>>> >>>>>>>> Linux-cluster mailing list
>>>>>>> >>>>>>>> Linux-cluster at redhat.com
>>>>>>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> --
>>>>>>> >>>>>>> Linux-cluster mailing list
>>>>>>> >>>>>>> Linux-cluster at redhat.com
>>>>>>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> --
>>>>>>> >>>>>> Linux-cluster mailing list
>>>>>>> >>>>>> Linux-cluster at redhat.com
>>>>>>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> Linux-cluster mailing list
>>>>>>> >>>>> Linux-cluster at redhat.com
>>>>>>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> Linux-cluster mailing list
>>>>>>> >>> Linux-cluster at redhat.com
>>>>>>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Linux-cluster mailing list
>>>>>>> >> Linux-cluster at redhat.com
>>>>>>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >>
>>>>>>> >
>>>>>>> > --
>>>>>>> > Linux-cluster mailing list
>>>>>>> > Linux-cluster at redhat.com
>>>>>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>> >
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Tue Sep 29 21:11:39 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Tue, 29 Sep 2009 16:11:39 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909291408j1fa2703ci31cba472acea04ec@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
	<29ae894c0909291402v5e5ea227s431303e459a82107@mail.gmail.com>
	<8b711df40909291405r44f36394x6def50d175932968@mail.gmail.com>
	<29ae894c0909291408j1fa2703ci31cba472acea04ec@mail.gmail.com>
Message-ID: <8b711df40909291411t4c6134bcja938f393612a2466@mail.gmail.com>

Ok here it is.

<?xml version="1.0"?>
<cluster alias="test" config_version="23" name="test">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="cvtst2" nodeid="1" votes="1">
			<fence/>
		</clusternode>
		<clusternode name="cvtst1" nodeid="2" votes="1">
			<fence/>
		</clusternode>
		<clusternode name="cvtst3" nodeid="3" votes="1">
			<fence/>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices/>
	<rm log_level="7" log_facility="local4">
		<failoverdomains>
			<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
				<failoverdomainnode name="cvtst2" priority="3"/>
				<failoverdomainnode name="cvtst1" priority="1"/>
				<failoverdomainnode name="cvtst3" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources/>
		<vm autostart="0" domain="myfd1" exclusive="0" max_restarts="0"
name="guest1" path="/vms" recovery="restart" restart_expire_time="0"/>
	</rm>
</cluster>


Thanks
Paras.

On Tue, Sep 29, 2009 at 4:08 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> the whole cluster.conf not only the vm entry.
>
> The problem seems to be rgmanager,
>
> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> Yes strange.
>>
>> cluster.conf 's entry is
>>
>> <vm autostart="0" domain="myfd1" exclusive="0" max_restarts="0"
>> name="guest1" path="/vms" recovery="restart" restart_expire_time="0"/>
>>
>>
>> Paras.
>>
>>
>> On Tue, Sep 29, 2009 at 4:02 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> That looks strange, maybe your cluster.conf will help
>>>
>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>> I don't see rgmanager .
>>>>
>>>> Here is the o/p from clustat
>>>>
>>>> [root at cvtst1 cluster]# clustat
>>>> Cluster Status for test @ Tue Sep 29 15:53:33 2009
>>>> Member Status: Quorate
>>>>
>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online
>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online, Local
>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online
>>>>
>>>>
>>>> Thanks
>>>> Paras.
>>>>
>>>> On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>>> It looks correct, rgmanager seems to start on all nodes
>>>>>
>>>>> what gives you clustat ?
>>>>>
>>>>> If rgmanager doesn't show, check out the logs something may have gone wrong.
>>>>>
>>>>>
>>>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>>>> Change to 7 and i got this log
>>>>>>
>>>>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>>>>>> Cluster Service Manager...
>>>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
>>>>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>>>>>> Manager is stopped.
>>>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>>>>>> Manager Starting
>>>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>>>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>>>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>>>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>>>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>>>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>>>>>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>>>>>>
>>>>>>
>>>>>> Anything unusual here?
>>>>>>
>>>>>> Paras.
>>>>>>
>>>>>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>> I use log_level=7 to have more debugging info.
>>>>>>>
>>>>>>> It seems 4 is not enough.
>>>>>>>
>>>>>>> Brem
>>>>>>>
>>>>>>>
>>>>>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>> Withe log_level of 3 I got only this
>>>>>>>>
>>>>>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>>>>>>> Cluster Service Manager...
>>>>>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>>>>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>>>>>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>>>>>>> Manager is stopped.
>>>>>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>>>>>>> Manager Starting
>>>>>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>>>>>>> Cluster Service Manager...
>>>>>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>>>>>>> Manager is stopped.
>>>>>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>>>>>>> Manager Starting
>>>>>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>>>>>>>
>>>>>>>> I do not know what the last line means.
>>>>>>>>
>>>>>>>> rgmanager version I am running is:
>>>>>>>> rgmanager-2.0.52-1.el5.centos
>>>>>>>>
>>>>>>>> I don't what has gone wrong.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Paras.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>> > you mean it stopped successfully on all the nodes but it is failing to
>>>>>>>> > start only on node cvtst1 ?
>>>>>>>> >
>>>>>>>> > look at the following page ?to make rgmanager more verbose. It 'll
>>>>>>>> > help debug....
>>>>>>>> >
>>>>>>>> > http://sources.redhat.com/cluster/wiki/RGManager
>>>>>>>> >
>>>>>>>> > at Logging Configuration section
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>> >> Brem,
>>>>>>>> >>
>>>>>>>> >> When I try to restart rgmanager on all the nodes, this time i do not
>>>>>>>> >> see rgmanager running on the first node. But I do see on other 2
>>>>>>>> >> nodes.
>>>>>>>> >>
>>>>>>>> >> Log on the first node:
>>>>>>>> >>
>>>>>>>> >> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>>>>>>> >> Manager Starting
>>>>>>>> >> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>>>>>>> >> Cluster Service Manager...
>>>>>>>> >> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>>>>>>> >> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>>>>>>>> >> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>>>>>>> >> Manager is stopped.
>>>>>>>> >> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>>>>>>> >> Manager Starting
>>>>>>>> >>
>>>>>>>> >> -
>>>>>>>> >> It seems service is running , ?but I do not see rgmanger running using clustat
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> Don't know what is going on.
>>>>>>>> >>
>>>>>>>> >> Thanks
>>>>>>>> >> Paras.
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>>>>>>> >> <brem.belguebli at gmail.com> wrote:
>>>>>>>> >>> Paras,
>>>>>>>> >>>
>>>>>>>> >>> Another thing, it would have been more interesting to have a start
>>>>>>>> >>> DEBUG not a stop.
>>>>>>>> >>>
>>>>>>>> >>> That's why I was asking you to first stop the vm manually on all your
>>>>>>>> >>> nodes, stop eventually rgmanager on all the nodes to reset the
>>>>>>>> >>> potential wrong states you may have, restart rgmanager.
>>>>>>>> >>>
>>>>>>>> >>> If your VM is configured to autostart, this will make it start.
>>>>>>>> >>>
>>>>>>>> >>> It should normally fail (as it does now). Send out your newly created
>>>>>>>> >>> DEBUG file.
>>>>>>>> >>>
>>>>>>>> >>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>>>>>> >>>> Hi Paras,
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> I don't know the xen/cluster combination well, but if I do remember
>>>>>>>> >>>> well, I think I've read somewhere that when using xen you have to
>>>>>>>> >>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>>>>>> >>>>
>>>>>>>> >>>> This would make rgmanager use xm commands instead of virsh
>>>>>>>> >>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>>>>>> >>>> VM instead of xm commands.
>>>>>>>> >>>> Check out the RH docs about virtualization
>>>>>>>> >>>>
>>>>>>>> >>>> I'm not a 100% sure about that, I may be completely wrong.
>>>>>>>> >>>>
>>>>>>>> >>>> Brem
>>>>>>>> >>>>
>>>>>>>> >>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>> >>>>> The only thing I noticed is the message after stopping the vm using xm
>>>>>>>> >>>>> in all nodes and starting using clusvcadm is
>>>>>>>> >>>>>
>>>>>>>> >>>>> "Virtual machine guest1 is blocked"
>>>>>>>> >>>>>
>>>>>>>> >>>>> The whole DEBUG file is attached.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> Thanks
>>>>>>>> >>>>> Paras.
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>>>>>> >>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>> >>>>>> There's a problem with the script that is called by rgmanager to start
>>>>>>>> >>>>>> the VM, I don't know what causes it
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> May be you should try something like :
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> 1) stop the VM on all nodes with xm commands
>>>>>>>> >>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>>>>>> >>>>>> lines (after the #!/bin/bash ):
>>>>>>>> >>>>>> ? exec >/tmp/DEBUG 2>&1
>>>>>>>> >>>>>> ? set -x
>>>>>>>> >>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> It should fail as it did before.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>>>>>> >>>>>> fails (it may generate a lot of debug)
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Brem
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>> >>>>>>> No I am not manually starting not using automatic init scripts.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>>>>> >>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>>>>> >>>>>>> three nodes.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> clustat says:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>>> >>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>>> >>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>>> >>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> But I can see the vm from xm li.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> This is what I can see from the log:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>>> >>>>>>> returned 1 (generic error)
>>>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>>> >>>>>>> vm:guest1; return value: 1
>>>>>>>> >>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>>> >>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>>> >>>>>>> recovering
>>>>>>>> >>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>>>>> >>>>>>> service vm:guest1
>>>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>>> >>>>>>> returned 1 (generic error)
>>>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>>> >>>>>>> vm:guest1; return value: 1
>>>>>>>> >>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>>> >>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>>> >>>>>>> recovering
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Paras.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>>>>> >>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>> >>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>>>>> >>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>>>>> >>>>>>>> automatic init script ?)
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> When clustered, you should never be starting services (manually or
>>>>>>>> >>>>>>>> thru automatic init script) out of cluster control
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>>>>> >>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>> >>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>>>> >>>>>>>>> says it is stopped.
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> --
>>>>>>>> >>>>>>>>> [root at cvtst1 ~]# clustat
>>>>>>>> >>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>>>> >>>>>>>>> Member Status: Quorate
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>>>>>> >>>>>>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>>>>>> >>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, rgmanager
>>>>>>>> >>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>>>>>>>> >>>>>>>>> Local, rgmanager
>>>>>>>> >>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online, rgmanager
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner (Last)
>>>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State
>>>>>>>> >>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- ------
>>>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>>> >>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>>> >>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stopped
>>>>>>>> >>>>>>>>> [root at cvtst1 ~]#
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> ---
>>>>>>>> >>>>>>>>> o/p of xm li on cvtst1
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> --
>>>>>>>> >>>>>>>>> [root at cvtst1 ~]# xm li
>>>>>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?28939.4
>>>>>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1 -b---- ? 7727.8
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> o/p of xm li on cvtst2
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> --
>>>>>>>> >>>>>>>>> [root at cvtst2 ~]# xm li
>>>>>>>> >>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
>>>>>>>> >>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2 r----- ?31558.9
>>>>>>>> >>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1 -b---- ? 7558.2
>>>>>>>> >>>>>>>>> ---
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Thanks
>>>>>>>> >>>>>>>>> Paras.
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>>>> >>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>> >>>>>>>>>> It looks like no.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> can you send an output of clustat ?of when the VM is running on
>>>>>>>> >>>>>>>>>> multiple nodes at the same time?
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>> >>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>>> >>>>>>>>>>> properly handled by the cluster.
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> Thanks
>>>>>>>> >>>>>>>>>>> Paras.
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>> >>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> --
>>>>>>>> >>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>> >>>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>> >>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>> >>>>>>>>>>>> ? ? ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ?<clusternodes>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<fence/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</clusternode>
>>>>>>>> >>>>>>>>>>>> ? ? ? ?</clusternodes>
>>>>>>>> >>>>>>>>>>>> ? ? ? ?<cman/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ?<fencedevices/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ?<rm>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<failoverdomains>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?<failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ?</failoverdomain>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?</failoverdomains>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<resources/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ? ? ? ? ?<vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>>> >>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>> >>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>> >>>>>>>>>>>> ? ? ? ?</rm>
>>>>>>>> >>>>>>>>>>>> </cluster>
>>>>>>>> >>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>> >>>>>>>>>>>> ------
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> Thanks!
>>>>>>>> >>>>>>>>>>>> Paras.
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>> >>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>> >>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>> >>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>> >>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>> >>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>> >>>>>>>>>>>>> you can post your cluster.conf?
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> Regards,
>>>>>>>> >>>>>>>>>>>>> Volker
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>> --
>>>>>>>> >>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>>
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>> --
>>>>>>>> >>>>>>>>>>> Linux-cluster mailing list
>>>>>>>> >>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>>>>>>>>>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> --
>>>>>>>> >>>>>>>>>> Linux-cluster mailing list
>>>>>>>> >>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> --
>>>>>>>> >>>>>>>>> Linux-cluster mailing list
>>>>>>>> >>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> --
>>>>>>>> >>>>>>>> Linux-cluster mailing list
>>>>>>>> >>>>>>>> Linux-cluster at redhat.com
>>>>>>>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> --
>>>>>>>> >>>>>>> Linux-cluster mailing list
>>>>>>>> >>>>>>> Linux-cluster at redhat.com
>>>>>>>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> --
>>>>>>>> >>>>>> Linux-cluster mailing list
>>>>>>>> >>>>>> Linux-cluster at redhat.com
>>>>>>>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> --
>>>>>>>> >>>>> Linux-cluster mailing list
>>>>>>>> >>>>> Linux-cluster at redhat.com
>>>>>>>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>>
>>>>>>>> >>> --
>>>>>>>> >>> Linux-cluster mailing list
>>>>>>>> >>> Linux-cluster at redhat.com
>>>>>>>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Linux-cluster mailing list
>>>>>>>> >> Linux-cluster at redhat.com
>>>>>>>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Linux-cluster mailing list
>>>>>>>> > Linux-cluster at redhat.com
>>>>>>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> >
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From johannes.russek at io-consulting.net  Tue Sep 29 23:29:51 2009
From: johannes.russek at io-consulting.net (=?ISO-8859-1?Q?Johannes_Ru=DFek?=)
Date: Wed, 30 Sep 2009 01:29:51 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>	<8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com>	<29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com>	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
Message-ID: <4AC2986F.8050100@io-consulting.net>

make sure the time on the nodes is in sync, apparently when a node has 
too much offset, you won't see rgmanager (even though the process is 
running).
this happened today and setting the time fixed it for me. afaicr there 
was no sign of this in the logs though.
johannes

Paras pradhan schrieb:
> I don't see rgmanager .
>
> Here is the o/p from clustat
>
> [root at cvtst1 cluster]# clustat
> Cluster Status for test @ Tue Sep 29 15:53:33 2009
> Member Status: Quorate
>
>  Member Name                                                     ID   Status
>  ------ ----                                                     ---- ------
>  cvtst2                                                    1 Online
>  cvtst1                                                     2 Online, Local
>  cvtst3                                                     3 Online
>
>
> Thanks
> Paras.
>
> On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>   
>> It looks correct, rgmanager seems to start on all nodes
>>
>> what gives you clustat ?
>>
>> If rgmanager doesn't show, check out the logs something may have gone wrong.
>>
>>
>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>     
>>> Change to 7 and i got this log
>>>
>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>>> Cluster Service Manager...
>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>>> Manager is stopped.
>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>>> Manager Starting
>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>>>
>>>
>>> Anything unusual here?
>>>
>>> Paras.
>>>
>>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>>       
>>>> I use log_level=7 to have more debugging info.
>>>>
>>>> It seems 4 is not enough.
>>>>
>>>> Brem
>>>>
>>>>
>>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>>>         
>>>>> Withe log_level of 3 I got only this
>>>>>
>>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>>>> Cluster Service Manager...
>>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
>>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>>>> Manager is stopped.
>>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>>>> Manager Starting
>>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>>>> Cluster Service Manager...
>>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>>>> Manager is stopped.
>>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>>>> Manager Starting
>>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down
>>>>>
>>>>> I do not know what the last line means.
>>>>>
>>>>> rgmanager version I am running is:
>>>>> rgmanager-2.0.52-1.el5.centos
>>>>>
>>>>> I don't what has gone wrong.
>>>>>
>>>>> Thanks
>>>>> Paras.
>>>>>
>>>>>
>>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>           
>>>>>> you mean it stopped successfully on all the nodes but it is failing to
>>>>>> start only on node cvtst1 ?
>>>>>>
>>>>>> look at the following page  to make rgmanager more verbose. It 'll
>>>>>> help debug....
>>>>>>
>>>>>> http://sources.redhat.com/cluster/wiki/RGManager
>>>>>>
>>>>>> at Logging Configuration section
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>             
>>>>>>> Brem,
>>>>>>>
>>>>>>> When I try to restart rgmanager on all the nodes, this time i do not
>>>>>>> see rgmanager running on the first node. But I do see on other 2
>>>>>>> nodes.
>>>>>>>
>>>>>>> Log on the first node:
>>>>>>>
>>>>>>> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>>>>>> Manager Starting
>>>>>>> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>>>>>> Cluster Service Manager...
>>>>>>> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>>>>>> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
>>>>>>> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>>>>>> Manager is stopped.
>>>>>>> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>>>>>> Manager Starting
>>>>>>>
>>>>>>> -
>>>>>>> It seems service is running ,  but I do not see rgmanger running using clustat
>>>>>>>
>>>>>>>
>>>>>>> Don't know what is going on.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Paras.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>               
>>>>>>>> Paras,
>>>>>>>>
>>>>>>>> Another thing, it would have been more interesting to have a start
>>>>>>>> DEBUG not a stop.
>>>>>>>>
>>>>>>>> That's why I was asking you to first stop the vm manually on all your
>>>>>>>> nodes, stop eventually rgmanager on all the nodes to reset the
>>>>>>>> potential wrong states you may have, restart rgmanager.
>>>>>>>>
>>>>>>>> If your VM is configured to autostart, this will make it start.
>>>>>>>>
>>>>>>>> It should normally fail (as it does now). Send out your newly created
>>>>>>>> DEBUG file.
>>>>>>>>
>>>>>>>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>>>>>>                 
>>>>>>>>> Hi Paras,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't know the xen/cluster combination well, but if I do remember
>>>>>>>>> well, I think I've read somewhere that when using xen you have to
>>>>>>>>> declare the use_virsh=0 key in the VM definition in the cluster.conf.
>>>>>>>>>
>>>>>>>>> This would make rgmanager use xm commands instead of virsh
>>>>>>>>> The DEBUG output shows clearly that you are using virsh to manage your
>>>>>>>>> VM instead of xm commands.
>>>>>>>>> Check out the RH docs about virtualization
>>>>>>>>>
>>>>>>>>> I'm not a 100% sure about that, I may be completely wrong.
>>>>>>>>>
>>>>>>>>> Brem
>>>>>>>>>
>>>>>>>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>                   
>>>>>>>>>> The only thing I noticed is the message after stopping the vm using xm
>>>>>>>>>> in all nodes and starting using clusvcadm is
>>>>>>>>>>
>>>>>>>>>> "Virtual machine guest1 is blocked"
>>>>>>>>>>
>>>>>>>>>> The whole DEBUG file is attached.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Paras.
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>>>                     
>>>>>>>>>>> There's a problem with the script that is called by rgmanager to start
>>>>>>>>>>> the VM, I don't know what causes it
>>>>>>>>>>>
>>>>>>>>>>> May be you should try something like :
>>>>>>>>>>>
>>>>>>>>>>> 1) stop the VM on all nodes with xm commands
>>>>>>>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the following
>>>>>>>>>>> lines (after the #!/bin/bash ):
>>>>>>>>>>>   exec >/tmp/DEBUG 2>&1
>>>>>>>>>>>   set -x
>>>>>>>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>>>>>>>>
>>>>>>>>>>> It should fail as it did before.
>>>>>>>>>>>
>>>>>>>>>>> edit the the /tmp/DEBUG file and you will be able to see where it
>>>>>>>>>>> fails (it may generate a lot of debug)
>>>>>>>>>>>
>>>>>>>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>>>>>>>>
>>>>>>>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>>>>>>>>
>>>>>>>>>>> Brem
>>>>>>>>>>>
>>>>>>>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>>                       
>>>>>>>>>>>> No I am not manually starting not using automatic init scripts.
>>>>>>>>>>>>
>>>>>>>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>>>>>>>>
>>>>>>>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few seconds it
>>>>>>>>>>>> says guest1 started . But after a while I can see the guest1 on all
>>>>>>>>>>>> three nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> clustat says:
>>>>>>>>>>>>
>>>>>>>>>>>>  Service Name                                            Owner (Last)
>>>>>>>>>>>>                                          State
>>>>>>>>>>>>  ------- ----                                            ----- ------
>>>>>>>>>>>>                                          -----
>>>>>>>>>>>>  vm:guest1                                               (none)
>>>>>>>>>>>>                                          stopped
>>>>>>>>>>>>
>>>>>>>>>>>> But I can see the vm from xm li.
>>>>>>>>>>>>
>>>>>>>>>>>> This is what I can see from the log:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>>>>>>> returned 1 (generic error)
>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>>>>>>> vm:guest1; return value: 1
>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>>>>>>> recovering
>>>>>>>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
>>>>>>>>>>>> service vm:guest1
>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
>>>>>>>>>>>> returned 1 (generic error)
>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
>>>>>>>>>>>> vm:guest1; return value: 1
>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
>>>>>>>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
>>>>>>>>>>>> recovering
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Paras.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>>>>>                         
>>>>>>>>>>>>> Have you started  your VM via rgmanager (clusvcadm -e vm:guest1) or
>>>>>>>>>>>>> using xm commands out of cluster control  (or maybe a thru an
>>>>>>>>>>>>> automatic init script ?)
>>>>>>>>>>>>>
>>>>>>>>>>>>> When clustered, you should never be starting services (manually or
>>>>>>>>>>>>> thru automatic init script) out of cluster control
>>>>>>>>>>>>>
>>>>>>>>>>>>> The thing would be to stop your vm on all the nodes with the adequate
>>>>>>>>>>>>> xm command (not using xen myself) and try to start it with clusvcadm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> Ok. Please see below. my vm is running on all nodes though clustat
>>>>>>>>>>>>>> says it is stopped.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> [root at cvtst1 ~]# clustat
>>>>>>>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>>>>>>>>>> Member Status: Quorate
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Member Name                                                     ID   Status
>>>>>>>>>>>>>>  ------ ----                                                     ---- ------
>>>>>>>>>>>>>>  cvtst2                                                    1 Online, rgmanager
>>>>>>>>>>>>>>  cvtst1                                                     2 Online,
>>>>>>>>>>>>>> Local, rgmanager
>>>>>>>>>>>>>>  cvtst3                                                     3 Online, rgmanager
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Service Name                                            Owner (Last)
>>>>>>>>>>>>>>                                          State
>>>>>>>>>>>>>>  ------- ----                                            ----- ------
>>>>>>>>>>>>>>                                          -----
>>>>>>>>>>>>>>  vm:guest1                                               (none)
>>>>>>>>>>>>>>                                          stopped
>>>>>>>>>>>>>> [root at cvtst1 ~]#
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> o/p of xm li on cvtst1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> [root at cvtst1 ~]# xm li
>>>>>>>>>>>>>> Name                                      ID Mem(MiB) VCPUs State   Time(s)
>>>>>>>>>>>>>> Domain-0                                   0     3470     2 r-----  28939.4
>>>>>>>>>>>>>> guest1                                     7      511     1 -b----   7727.8
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> o/p of xm li on cvtst2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> [root at cvtst2 ~]# xm li
>>>>>>>>>>>>>> Name                                      ID Mem(MiB) VCPUs State   Time(s)
>>>>>>>>>>>>>> Domain-0                                   0     3470     2 r-----  31558.9
>>>>>>>>>>>>>> guest1                                    21      511     1 -b----   7558.2
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Paras.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> It looks like no.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> can you send an output of clustat  of when the VM is running on
>>>>>>>>>>>>>>> multiple nodes at the same time?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not being
>>>>>>>>>>>>>>>> properly handled by the cluster.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Paras.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>>>>>>                                 
>>>>>>>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>>>>>>>>>>>        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>>>>>>>>>>>>>        <clusternodes>
>>>>>>>>>>>>>>>>>                <clusternode name="cvtst2" nodeid="1" votes="1">
>>>>>>>>>>>>>>>>>                        <fence/>
>>>>>>>>>>>>>>>>>                </clusternode>
>>>>>>>>>>>>>>>>>                <clusternode name="cvtst1" nodeid="2" votes="1">
>>>>>>>>>>>>>>>>>                        <fence/>
>>>>>>>>>>>>>>>>>                </clusternode>
>>>>>>>>>>>>>>>>>                <clusternode name="cvtst3" nodeid="3" votes="1">
>>>>>>>>>>>>>>>>>                        <fence/>
>>>>>>>>>>>>>>>>>                </clusternode>
>>>>>>>>>>>>>>>>>        </clusternodes>
>>>>>>>>>>>>>>>>>        <cman/>
>>>>>>>>>>>>>>>>>        <fencedevices/>
>>>>>>>>>>>>>>>>>        <rm>
>>>>>>>>>>>>>>>>>                <failoverdomains>
>>>>>>>>>>>>>>>>>                        <failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
>>>>>>>>>>>>>>>>>                                <failoverdomainnode name="cvtst2" priority="3"/>
>>>>>>>>>>>>>>>>>                                <failoverdomainnode name="cvtst1" priority="1"/>
>>>>>>>>>>>>>>>>>                                <failoverdomainnode name="cvtst3" priority="2"/>
>>>>>>>>>>>>>>>>>                        </failoverdomain>
>>>>>>>>>>>>>>>>>                </failoverdomains>
>>>>>>>>>>>>>>>>>                <resources/>
>>>>>>>>>>>>>>>>>                <vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
>>>>>>>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>>>>>>>>>>>        </rm>
>>>>>>>>>>>>>>>>> </cluster>
>>>>>>>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>>>>>>>>>>> ------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>> Paras.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker at ixolution.de> wrote:
>>>>>>>>>>>>>>>>>                                   
>>>>>>>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>                                     
>>>>>>>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines. Now I am
>>>>>>>>>>>>>>>>>>> having another problem. When I start the my xen vm in one node, it
>>>>>>>>>>>>>>>>>>> also starts on other nodes. Which daemon controls  this?
>>>>>>>>>>>>>>>>>>>                                       
>>>>>>>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the rgmanager
>>>>>>>>>>>>>>>>>> package). To me, this sounds like a configuration problem. Maybe,
>>>>>>>>>>>>>>>>>> you can post your cluster.conf?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Volker
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>                                     
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                                 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> --
>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> --
>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> --
>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>>                 
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>>               
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>>             
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>>           
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>         
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>       
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>     
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   



From louis at acom.com.hk  Wed Sep 30 01:16:09 2009
From: louis at acom.com.hk (Louis)
Date: Wed, 30 Sep 2009 09:16:09 +0800
Subject: [Linux-cluster] Dell iDRAC 6 Support for fencing device
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com><29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com><8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com><29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com><8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com><29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com><8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com><29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com><29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com><8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com><29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<DD073C87D7A2E54CA9235E8382F20A5C0B44E8AB6F@ushqwmb01.palm1.palmone.com>
Message-ID: <6683EA6E3C0A4B0E8881E698DDA3AD3D@homeuser>

Hi,

I used ipmilan to bypass the iDREC6 fencing.

<fencedevice agent="fence_ipmilan" ipaddr="10.10.10.10" login="xxxxx" 
name="xxxxx" passwd="yyyy">


Regards
Louis
----- Original Message ----- 
From: "Nitin Choudhary" <Nitin.Choudhary at palm.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Tuesday, September 29, 2009 1:18 PM
Subject: [Linux-cluster] Dell iDRAC 6 Support for fencing device


>
> Hi!
>
> It seems that iDREC6 is not supported as fencing devices.
>
> Has anyone setup this before. Is there any workaround for this.
>
> Thanks,
>
> Nitin
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> 




From swhiteho at redhat.com  Wed Sep 30 09:24:41 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 30 Sep 2009 10:24:41 +0100
Subject: [Linux-cluster] GFS2 fatal: invalid metadata block
In-Reply-To: <4AC26606.3070401@fiber.net>
References: <4AB4BD8E.9060905@fiber.net>
	<1253537628.6052.274.camel@localhost.localdomain>
	<4ABD0C74.7030705@fiber.net>
	<1254129344.6052.399.camel@localhost.localdomain>
	<4AC26606.3070401@fiber.net>
Message-ID: <1254302681.2686.23.camel@localhost.localdomain>

Hi,

On Tue, 2009-09-29 at 13:54 -0600, Kai Meyer wrote:
> Steven Whitehouse wrote:
> > Hi,
> >
> > You seem to have a number of issues here....
> >
> > On Fri, 2009-09-25 at 12:31 -0600, Kai Meyer wrote:
> >   
> >> Sorry for the slow response. We ended up (finally) purchasing some RHEL 
> >> licenses to try and get some phone support for this problem, and came up 
> >> with a plan to salvage what we could. I'll try to offer a brief history 
> >> of the problem in hope you can help me understand this issue a little 
> >> better.
> >> I've posted the relevant logfile entries to the events described here : 
> >> http://kai.gnukai.com/gfs2_meltdown.txt
> >> All the nodes send syslog to a remote server named pxe, so the combined 
> >> syslog for all the nodes plus the syslog server is here: 
> >> http://kai.gnukai.com/messages.txt
> >> We started with a 4 node cluster (nodes 1, 2, 4, 5). The GFS2 filesystem 
> >> was created with the latest CentOS 5.3 had to offer when it was 
> >> released. Node 3 was off at the time the errors occurred, and not part 
> >> of the cluster.
> >> First issue I can recover from syslog is from node 5 (192.168.100.105) 
> >> on Sep 8 14:11:27 was a 'fatal: invalid metadata block' error that 
> >> resulted in the file system being withdrawn.
> >>     
> >
> > Ok. So lets start with that message. Once that message has appeared, it
> > means that something on disk has been corrupted. The only way in which
> > that can be fixed is to unmount on all nodes and run fsck.gfs2 on the
> > filesystem. The other nodes will only carry on working until they too
> > read the same erroneous block.
> >
> > These issues are usually very tricky to track down. The main reason for
> > that is that the event which caused the corruption is usually long in
> > the past before the issue is discovered. Often there has been so much
> > activity that its impossible to attribute it to any particular event.
> >
> > That said, we are very interested to receive reports of such corruption
> > in case we can figure out the common factors between such reports.
> >
> >   
> Is there any more information I can provide that would be useful? At 
> this point, I don't have the old disk array anymore. Once the data was 
> recovered (as far as it was possible), the boss had me run smart checks 
> on the disks, and then he re-sold them to a customer.

There are a number of useful bits of info which we tend to ask for to
try and narrow down such issues, these include:

1. How was the filesystem created?
 - Was it with mkfs.gfs2 or an upgrade from a GFS2 filesystem
 - Was it grown with gfs2_grow at any stage?
2. Recovery
 - Was a failed node or node(s) recovered at some stage since the fs was
created?
 - What kind of fencing was used?
3. General usage pattern
 - What applications were running?
 - What kind of files were in use (large/small) ?
 - How were the files arranged? (all in one directory, a few directories
or many directories)
 - Was the usage heavy or light?
 - Was the fs using quota?
 - Was the system using selinux? (even if not in enforcing mode)
4. Hardware
 - What was the array in use? (make/model)
 - How was it configured? (RAID level)
 - How was it connected to the nodes? (fibre channel, AoE, etc)
5. Manual intervention
 - Was fsck.gfs2 run on the filesystem at any stage?
 - Did it find/repair any problems? (if so, what?)
 - Were there any log messages which stuck you as odd?
 - Did you use manual fencing at any time? (not recommended, but
possible)
 - Did you notice any operations which seemed to run unusually
fast/slow?

I do realise that in many cases there will be only partial information
for a lot of the above questions, but thats the kind of information that
is very helpful to us in figuring these things out.

> > The current behaviour of withdrawing a node in the event of a disk error
> > is not ideal. In reality there is often little other choice though, as
> > letting the node continue to operate risks possible greater corruption
> > of data due to the potential for it to be working on incorrect data from
> > the original problem.
> >
> > On recent upstream kernels we've tried to be a bit better about handling
> > such errors by turning off use of individual resource groups in some
> > cases, so that at least some filesystem activity can carry on.
> >
> >   
> Is there a bug or something I can follow to see updates on this issue?
> 
There is bz #519049, there are a couple of others which might possibly
be the same thing, but might just as easily be configuration issues with
faulty fencing.

> >> Next was node 4 (192.168.100.104) to hit a 'fatal: filesystem 
> >> consistency error' that also resulted in the file system being 
> >> withdrawn. On the systems themselves, any attempt to access the 
> >> filesystem would result in a I/O error response. At the prospect of 
> >> rebooting 2 of the 4 nodes in my cluster, I brought node 3 
> >> (192.168.100.103) online first. Then I power cycled nodes 4 and 5 one at 
> >> a time and let them come back online. These nodes are running Xen, so I 
> >> start to bring the VMs that were on nodes 4 and 5 online on nodes 3-5 
> >> after all 3 had joined the cluster.
> >> Shortly thereafter, node 3 encounters the 'fatal: invalid metadata 
> >> block', and withdraws the file system. Then node 2 (.102) encounters 
> >> 'fatal: invalid metadata block' also, and withdraws the filesystem. So I 
> >> reboot them.
> >> During their reboot, nodes 1 (.101) and 5 hits the same 'fatal: invalid 
> >> metadata block' error. I waited for nodes 2 and 3 to come back online to 
> >> preserve the cluster. At this point, node 4 was the only node that still 
> >> had the filesystem mounted. After I had rebooted the other 4 nodes, none 
> >> of them could mount the files system after joining the cluster, and node 
> >> 4 was spinning on the error:
> >> Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
> >> fsid=xencluster1:xenclusterfs1.0: jid=4: Trying to acquire journal lock...
> >> Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
> >> fsid=xencluster1:xenclusterfs1.0: jid=4: Busy
> >> It wasn't until this point that we suspected the SAN. We discovered that 
> >> the SAN had marked a drive as "failed" but did not remove it from the 
> >> array and begin to rebuild on the hot spare. When we physically removed 
> >> the failed drive, the hot spare was picked up and put into the array.
> >> The VMs on node 4 were the only ones "running" but they had all crashed 
> >> because their disk was unavailable. I decided to reboot all the nodes to 
> >> try and re-establish the cluster. We were able to get all the VMs turned 
> >> back on, and we thought we were out of the dark, with the exception of 
> >> the high level of filesystem corruption we caused inside 30% of the VM's 
> >> filesystems. We ran them through their ext3 filesystem checks, and got 
> >> them all running again.
> >>
> >>     
> > ext3 or gfs2? I assume you mean the latter
> >
> >   
> I did mean ext3. The filesystems I was running fsck on were inside each 
> individual VM's disk image. At this point, we had not attempted a gfs2_fsck.
Ah, now I see. Sorry I didn't follow that the first time.

> >> Then at the time I send the original email, we were encountering the 
> >> same invalid metadata block errors on the VMs at different points.
> >>
> >> With Redhat on the phone, we decided to migrate as much data as we could 
> >> from the original production SAN to a new SAN, and bring the VMs online 
> >> on the new SAN. There were a total of 3 VM disk images that would not 
> >> copy because they would trigger the invalid metadata block error every 
> >> time. After the migration, we tried 3 filesystem checks, all of which 
> >> failed, leaving the fsck_dlm mechanism configured on the filesystem. We 
> >> were able to override the lock with the instructions here:
> >> http://kbase.redhat.com/faq/docs/DOC-17402
> >>
> >>     
> > Was that reported as a bugzilla? fsck.gfs2 should certainly not fail int
> > that way. Although, bearing in mind what you've said about bad hardware,
> > that might be the reason. 
> >
> >   
> I didn't do any reporting via bugzilla. Redhat tech support intimated 
> that a bug report from CentOS servers wouldn't get much attention. 
> Another reason we are very interested in moving to RHEL 5.4.
Well its not going to get as much attention as a RHEL bug, but all
reports are useful. It may give us a hint which we'd not otherwise have
and sometimes the only way to solve an issue is to look at lots of
reports to find the common factors. So please don't let that put you off
reporting it,

Steve.





From r.rosenberger at netbiscuits.com  Wed Sep 30 12:14:00 2009
From: r.rosenberger at netbiscuits.com (Rene Rosenberger)
Date: Wed, 30 Sep 2009 14:14:00 +0200
Subject: [Linux-cluster] Problems starting a VM Service
Message-ID: <22a301ca41c7$7e40e540$7ac2afc0$@rosenberger@netbiscuits.com>

Hi,
 
i have Luci and Ricci installed. I have also created a Xen VM on
cluster-node01. After that i copied the conf file fort hat vm to a place
where both cluster-nodes can access it. When i now try to start a sopped vm
service it starts. Nice J but in clustat and in luci it is shown as stopped!
The thing is that luci really starts the vm service.
 
[root at cluster-node01 xen]# clustat
Cluster Status for cluster01 @ Wed Sep 30 14:09:35 2009
Member Status: Quorate
 
 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 cluster-node01.netbiscuits.com                                      1
Online, Local, rgmanager
 cluster-node02.netbiscuits.com                                      2
Online, rgmanager
 
 Service Name                                                     Owner
(Last)                                                     State
 ------- ----                                                     -----
------                                                     -----
 vm:Log-Server                                                    (none)
stopped
 vm:Nagios                                                        (none)
stopped
 
 
but it is definetly running!
 
Sep 30 14:09:20 cluster-node01 clurgmgrd[7780]: <notice> status on vm
"Nagios" returned 1 (generic error)
Sep 30 14:09:21 cluster-node01 clurgmgrd[7780]: <notice> Stopping service
vm:Nagios
Sep 30 14:09:21 cluster-node01 clurgmgrd[7780]: <notice> Service vm:Nagios
is recovering
Sep 30 14:09:22 cluster-node01 clurgmgrd[7780]: <notice> Service vm:Nagios
is stopped
 
What can i do to figure out the problem?
 
I have the following ip configurations:
 
Node01: 10.0.0.1
Node02: 10.0.0.2
Storage: 10.0.0.3
Luci: 192.168.100.79 -> this is because it is not connected tot he
10.0.0.0/24 network
 
Regards,
 
Rene Rosenberger


 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090930/5d5bc0f8/attachment.htm>

From lhh at redhat.com  Wed Sep 30 13:20:36 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 30 Sep 2009 09:20:36 -0400
Subject: [Linux-cluster] Problems starting a VM Service
In-Reply-To: <22a301ca41c7$7e40e540$7ac2afc0$@rosenberger@netbiscuits.com>
References: <22a301ca41c7$7e40e540$7ac2afc0$@rosenberger@netbiscuits.com>
Message-ID: <1254316836.16878.18.camel@localhost.localdomain>

On Wed, 2009-09-30 at 14:14 +0200, Rene Rosenberger wrote:
> Hi,
> 
>  
> 
> i have Luci and Ricci installed. I have also created a Xen VM on
> cluster-node01. After that i copied the conf file fort hat vm to a
> place where both cluster-nodes can access it. When i now try to start
> a sopped vm service it starts. Nice J but in clustat and in luci it is
> shown as stopped! The thing is that luci really starts the vm service.

Need cluster.conf & rgmanager version

There were some weird (now fixed) problems with VMs which involved a
race between cluster.conf updates and status checks; it's possible you
hit one of them.

-- Lon




From lhh at redhat.com  Wed Sep 30 13:24:27 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 30 Sep 2009 09:24:27 -0400
Subject: [Linux-cluster] problem relocate service
In-Reply-To: <2b86aff20909281748g242355b6m378d75e1386ea0e@mail.gmail.com>
References: <2b86aff20909281748g242355b6m378d75e1386ea0e@mail.gmail.com>
Message-ID: <1254317067.16878.22.camel@localhost.localdomain>

On Mon, 2009-09-28 at 21:48 -0300, Marcelo Guazzardo wrote:
> Hi I have a RED-HAT 5.3 and configured cluster RHCS on two nodes.
> Everythings works ok,services are relocated when one of the active
> nodes shutdown or reboot.
> But I have a problem when I unpluded power cord of the active node.
> The resources cannot be relocate, because the active node is down.
> 
> after, the other node fence the fail node, and after a few minutes,
> the two nodes is up again. but, in this while, the service is
> unrecheable. (the service never was relocated).
> how can i do to relocate the service?.

You'll want to look @ 'group_tool ls' output to before and after fencing
is complete.

rgmanager performs failover after fencing completes.  After that, you
should be able to relocate/start/stop at will.

Ensure you do not have multiple services with 'exclusive' set to 1.

-- Lon



From r.rosenberger at netbiscuits.com  Wed Sep 30 13:25:08 2009
From: r.rosenberger at netbiscuits.com (Rene Rosenberger)
Date: Wed, 30 Sep 2009 15:25:08 +0200
Subject: AW: [Linux-cluster] Problems starting a VM Service
In-Reply-To: <1254316836.16878.18.camel@localhost.localdomain>
References: <22a301ca41c7$7e40e540$7ac2afc0$@rosenberger@netbiscuits.com>
	<1254316836.16878.18.camel@localhost.localdomain>
Message-ID: <22d801ca41d1$6e5ed740$4b1c85c0$@rosenberger@netbiscuits.com>

Hi,

rgmanager-2.0.52-1

[root at cluster-node02 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="cluster01" config_version="16" name="cluster01">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="cluster-node01.netbiscuits.com"
nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Fence_Device_01"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="cluster-node02.netbiscuits.com"
nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Fence_Device_02"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" ipaddr="192.168.100.141"
login="root" name="Fence_Device_01" passwd="emoveo11wap"/>
                <fencedevice agent="fence_ipmilan" ipaddr="192.168.100.142"
login="root" name="Fence_Device_02" passwd="emoveo11wap"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="Failover_Domain_01"
nofailback="0" ordered="0" restricted="0">
                                <failoverdomainnode
name="cluster-node01.netbiscuits.com" priority="1"/>
                                <failoverdomainnode
name="cluster-node02.netbiscuits.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
                <vm autostart="1" domain="Failover_Domain_01" exclusive="0"
migrate="live" name="Nagios" path="/rootfs/vm/" recovery="relocate"/>
                <vm autostart="1" domain="Failover_Domain_01" exclusive="0"
migrate="live" name="Log-Server" path="/rootfs/vm/" recovery="relocate"/>
        </rm>
</cluster>

Regards, rene

-----Urspr?ngliche Nachricht-----
Von: Lon Hohberger [mailto:lhh at redhat.com] 
Gesendet: Mittwoch, 30. September 2009 15:21
An: r.rosenberger at netbiscuits.com; linux clustering
Betreff: Re: [Linux-cluster] Problems starting a VM Service

On Wed, 2009-09-30 at 14:14 +0200, Rene Rosenberger wrote:
> Hi,
> 
>  
> 
> i have Luci and Ricci installed. I have also created a Xen VM on
> cluster-node01. After that i copied the conf file fort hat vm to a
> place where both cluster-nodes can access it. When i now try to start
> a sopped vm service it starts. Nice J but in clustat and in luci it is
> shown as stopped! The thing is that luci really starts the vm service.

Need cluster.conf & rgmanager version

There were some weird (now fixed) problems with VMs which involved a
race between cluster.conf updates and status checks; it's possible you
hit one of them.

-- Lon





From teigland at redhat.com  Wed Sep 30 14:08:55 2009
From: teigland at redhat.com (David Teigland)
Date: Wed, 30 Sep 2009 09:08:55 -0500
Subject: [Linux-cluster] Groupd semaphore limits
In-Reply-To: <8a5668960909290143m6cba8fb3h2222c148a63569fe@mail.gmail.com>
References: <8a5668960909290143m6cba8fb3h2222c148a63569fe@mail.gmail.com>
Message-ID: <20090930140854.GA30028@redhat.com>

On Tue, Sep 29, 2009 at 10:43:33AM +0200, Juan Ramon Martin Blanco wrote:
> Hi folks!
> 
> Is it anywhere documented how many semaphores groupd uses for each gfs
> filesystem that is mounted? Is it anywhere documented what is the
> maximum number of gfs filesystems that can be mounted on a cluster?
> We have had an issue in our production server with gfs filesystems not
> being mounted and groupd consuming 100% cpu. Stracing it we saw it
> could not create any new semaphore (with semget). After increasing
> limits (modifying sysctl kernel.sem variable), everything is working
> fine.
> The default limit on rhel 5.3 is 128 semaphore matrix (each matrix has
> 3 semaphores). Mounting a gfs filesystem seems to consume 6
> semaphores, so a maximum of 64 can be mounted.
> 
> I hope it can help anybody facing this problem in the future. I would
> be fine if you document it somewhere.

This is a regression caused by openais ipc being rewritten to use semaphores.
Anything that uses openais (through various libraries) indirectly does this
ipc and consumes semaphores; it's hard to say exactly when you'll run out.  As
you've found, the only thing to do is to increase limit to something that
works for you.

(The ipc has since changed again to *not* use semaphores, I don't know what
release that will appear in.)

Dave



From pradhanparas at gmail.com  Wed Sep 30 14:49:06 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 30 Sep 2009 09:49:06 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <4AC2986F.8050100@io-consulting.net>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
	<4AC2986F.8050100@io-consulting.net>
Message-ID: <8b711df40909300749i2af6a711v2f866c55a046a388@mail.gmail.com>

All of the nodes are synced with ntp server. So this is not the case with me.

Thanks
Paras.

On Tue, Sep 29, 2009 at 6:29 PM, Johannes Ru?ek
<johannes.russek at io-consulting.net> wrote:
> make sure the time on the nodes is in sync, apparently when a node has too
> much offset, you won't see rgmanager (even though the process is running).
> this happened today and setting the time fixed it for me. afaicr there was
> no sign of this in the logs though.
> johannes
>
> Paras pradhan schrieb:
>>
>> I don't see rgmanager .
>>
>> Here is the o/p from clustat
>>
>> [root at cvtst1 cluster]# clustat
>> Cluster Status for test @ Tue Sep 29 15:53:33 2009
>> Member Status: Quorate
>>
>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID
>> Status
>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----
>> ------
>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online
>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>> Local
>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online
>>
>>
>> Thanks
>> Paras.
>>
>> On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>
>>>
>>> It looks correct, rgmanager seems to start on all nodes
>>>
>>> what gives you clustat ?
>>>
>>> If rgmanager doesn't show, check out the logs something may have gone
>>> wrong.
>>>
>>>
>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>
>>>>
>>>> Change to 7 and i got this log
>>>>
>>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>>>> Cluster Service Manager...
>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete,
>>>> exiting
>>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>>>> Manager is stopped.
>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>>>> Manager Starting
>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover
>>>> Domains
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>>>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>>>>
>>>>
>>>> Anything unusual here?
>>>>
>>>> Paras.
>>>>
>>>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>>
>>>>>
>>>>> I use log_level=7 to have more debugging info.
>>>>>
>>>>> It seems 4 is not enough.
>>>>>
>>>>> Brem
>>>>>
>>>>>
>>>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>>>>>
>>>>>>
>>>>>> Withe log_level of 3 I got only this
>>>>>>
>>>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>>>>>> Cluster Service Manager...
>>>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>>>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete,
>>>>>> exiting
>>>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>>>>>> Manager is stopped.
>>>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>>>>>> Manager Starting
>>>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>>>>>> Cluster Service Manager...
>>>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>>>>>> Manager is stopped.
>>>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>>>>>> Manager Starting
>>>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting
>>>>>> down
>>>>>>
>>>>>> I do not know what the last line means.
>>>>>>
>>>>>> rgmanager version I am running is:
>>>>>> rgmanager-2.0.52-1.el5.centos
>>>>>>
>>>>>> I don't what has gone wrong.
>>>>>>
>>>>>> Thanks
>>>>>> Paras.
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> you mean it stopped successfully on all the nodes but it is failing
>>>>>>> to
>>>>>>> start only on node cvtst1 ?
>>>>>>>
>>>>>>> look at the following page ?to make rgmanager more verbose. It 'll
>>>>>>> help debug....
>>>>>>>
>>>>>>> http://sources.redhat.com/cluster/wiki/RGManager
>>>>>>>
>>>>>>> at Logging Configuration section
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>
>>>>>>>>
>>>>>>>> Brem,
>>>>>>>>
>>>>>>>> When I try to restart rgmanager on all the nodes, this time i do not
>>>>>>>> see rgmanager running on the first node. But I do see on other 2
>>>>>>>> nodes.
>>>>>>>>
>>>>>>>> Log on the first node:
>>>>>>>>
>>>>>>>> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>>>>>>>> Manager Starting
>>>>>>>> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>>>>>>>> Cluster Service Manager...
>>>>>>>> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>>>>>>>> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete,
>>>>>>>> exiting
>>>>>>>> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>>>>>>>> Manager is stopped.
>>>>>>>> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>>>>>>>> Manager Starting
>>>>>>>>
>>>>>>>> -
>>>>>>>> It seems service is running , ?but I do not see rgmanger running
>>>>>>>> using clustat
>>>>>>>>
>>>>>>>>
>>>>>>>> Don't know what is going on.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Paras.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Paras,
>>>>>>>>>
>>>>>>>>> Another thing, it would have been more interesting to have a start
>>>>>>>>> DEBUG not a stop.
>>>>>>>>>
>>>>>>>>> That's why I was asking you to first stop the vm manually on all
>>>>>>>>> your
>>>>>>>>> nodes, stop eventually rgmanager on all the nodes to reset the
>>>>>>>>> potential wrong states you may have, restart rgmanager.
>>>>>>>>>
>>>>>>>>> If your VM is configured to autostart, this will make it start.
>>>>>>>>>
>>>>>>>>> It should normally fail (as it does now). Send out your newly
>>>>>>>>> created
>>>>>>>>> DEBUG file.
>>>>>>>>>
>>>>>>>>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Paras,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't know the xen/cluster combination well, but if I do
>>>>>>>>>> remember
>>>>>>>>>> well, I think I've read somewhere that when using xen you have to
>>>>>>>>>> declare the use_virsh=0 key in the VM definition in the
>>>>>>>>>> cluster.conf.
>>>>>>>>>>
>>>>>>>>>> This would make rgmanager use xm commands instead of virsh
>>>>>>>>>> The DEBUG output shows clearly that you are using virsh to manage
>>>>>>>>>> your
>>>>>>>>>> VM instead of xm commands.
>>>>>>>>>> Check out the RH docs about virtualization
>>>>>>>>>>
>>>>>>>>>> I'm not a 100% sure about that, I may be completely wrong.
>>>>>>>>>>
>>>>>>>>>> Brem
>>>>>>>>>>
>>>>>>>>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The only thing I noticed is the message after stopping the vm
>>>>>>>>>>> using xm
>>>>>>>>>>> in all nodes and starting using clusvcadm is
>>>>>>>>>>>
>>>>>>>>>>> "Virtual machine guest1 is blocked"
>>>>>>>>>>>
>>>>>>>>>>> The whole DEBUG file is attached.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Paras.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> There's a problem with the script that is called by rgmanager to
>>>>>>>>>>>> start
>>>>>>>>>>>> the VM, I don't know what causes it
>>>>>>>>>>>>
>>>>>>>>>>>> May be you should try something like :
>>>>>>>>>>>>
>>>>>>>>>>>> 1) stop the VM on all nodes with xm commands
>>>>>>>>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the
>>>>>>>>>>>> following
>>>>>>>>>>>> lines (after the #!/bin/bash ):
>>>>>>>>>>>> ?exec >/tmp/DEBUG 2>&1
>>>>>>>>>>>> ?set -x
>>>>>>>>>>>> 3) start the VM with clusvcadm -e vm:guest1
>>>>>>>>>>>>
>>>>>>>>>>>> It should fail as it did before.
>>>>>>>>>>>>
>>>>>>>>>>>> edit the the /tmp/DEBUG file and you will be able to see where
>>>>>>>>>>>> it
>>>>>>>>>>>> fails (it may generate a lot of debug)
>>>>>>>>>>>>
>>>>>>>>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>>>>>>>>>>>>
>>>>>>>>>>>> Post the DEBUG file if you're not able to see where it fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Brem
>>>>>>>>>>>>
>>>>>>>>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> No I am not manually starting not using automatic init scripts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I started the vm using: clusvcadm -e vm:guest1
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few
>>>>>>>>>>>>> seconds it
>>>>>>>>>>>>> says guest1 started . But after a while I can see the guest1 on
>>>>>>>>>>>>> all
>>>>>>>>>>>>> three nodes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> clustat says:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner
>>>>>>>>>>>>> (Last)
>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? State
>>>>>>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>>>>>>>>>>>>> ------
>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -----
>>>>>>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? stopped
>>>>>>>>>>>>>
>>>>>>>>>>>>> But I can see the vm from xm li.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is what I can see from the log:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm
>>>>>>>>>>>>> "guest1"
>>>>>>>>>>>>> returned 1 (generic error)
>>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed
>>>>>>>>>>>>> to start
>>>>>>>>>>>>> vm:guest1; return value: 1
>>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping
>>>>>>>>>>>>> service vm:guest1
>>>>>>>>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service
>>>>>>>>>>>>> vm:guest1 is
>>>>>>>>>>>>> recovering
>>>>>>>>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering
>>>>>>>>>>>>> failed
>>>>>>>>>>>>> service vm:guest1
>>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm
>>>>>>>>>>>>> "guest1"
>>>>>>>>>>>>> returned 1 (generic error)
>>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed
>>>>>>>>>>>>> to start
>>>>>>>>>>>>> vm:guest1; return value: 1
>>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping
>>>>>>>>>>>>> service vm:guest1
>>>>>>>>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service
>>>>>>>>>>>>> vm:guest1 is
>>>>>>>>>>>>> recovering
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Paras.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>>>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e
>>>>>>>>>>>>>> vm:guest1) or
>>>>>>>>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>>>>>>>>>>>>>> automatic init script ?)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When clustered, you should never be starting services
>>>>>>>>>>>>>> (manually or
>>>>>>>>>>>>>> thru automatic init script) out of cluster control
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The thing would be to stop your vm on all the nodes with the
>>>>>>>>>>>>>> adequate
>>>>>>>>>>>>>> xm command (not using xen myself) and try to start it with
>>>>>>>>>>>>>> clusvcadm.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then see if it is started on all nodes (send clustat output)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ok. Please see below. my vm is running on all nodes though
>>>>>>>>>>>>>>> clustat
>>>>>>>>>>>>>>> says it is stopped.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> [root at cvtst1 ~]# clustat
>>>>>>>>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>>>>>>>>>>>>>>> Member Status: Quorate
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ?Member Name
>>>>>>>>>>>>>>> ? ? ID ? Status
>>>>>>>>>>>>>>> ?------ ----
>>>>>>>>>>>>>>> ? ? ---- ------
>>>>>>>>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1
>>>>>>>>>>>>>>> Online, rgmanager
>>>>>>>>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2
>>>>>>>>>>>>>>> Online,
>>>>>>>>>>>>>>> Local, rgmanager
>>>>>>>>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3
>>>>>>>>>>>>>>> Online, rgmanager
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ?Service Name
>>>>>>>>>>>>>>> ?Owner (Last)
>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? State
>>>>>>>>>>>>>>> ?------- ----
>>>>>>>>>>>>>>> ?----- ------
>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -----
>>>>>>>>>>>>>>> ?vm:guest1
>>>>>>>>>>>>>>> (none)
>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? stopped
>>>>>>>>>>>>>>> [root at cvtst1 ~]#
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> o/p of xm li on cvtst1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> [root at cvtst1 ~]# xm li
>>>>>>>>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs
>>>>>>>>>>>>>>> State ? Time(s)
>>>>>>>>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2
>>>>>>>>>>>>>>> r----- ?28939.4
>>>>>>>>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1
>>>>>>>>>>>>>>> -b---- ? 7727.8
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> o/p of xm li on cvtst2
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> [root at cvtst2 ~]# xm li
>>>>>>>>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs
>>>>>>>>>>>>>>> State ? Time(s)
>>>>>>>>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2
>>>>>>>>>>>>>>> r----- ?31558.9
>>>>>>>>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1
>>>>>>>>>>>>>>> -b---- ? 7558.2
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Paras.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>>>>>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It looks like no.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> can you send an output of clustat ?of when the VM is running
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> multiple nodes at the same time?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And by the way, another one after having stopped (clusvcadm
>>>>>>>>>>>>>>>> -s vm:guest1) ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not
>>>>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>>> properly handled by the cluster.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Paras.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan
>>>>>>>>>>>>>>>>> <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ok.. here is my cluster.conf file
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>>>>>>>>>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>>>>>>>>>>>>>>>>>> ? ? ? <fence_daemon clean_start="0" post_fail_delay="0"
>>>>>>>>>>>>>>>>>> post_join_delay="3"/>
>>>>>>>>>>>>>>>>>> ? ? ? <clusternodes>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <clusternode name="cvtst2" nodeid="1"
>>>>>>>>>>>>>>>>>> votes="1">
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? <fence/>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? </clusternode>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <clusternode name="cvtst1" nodeid="2"
>>>>>>>>>>>>>>>>>> votes="1">
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? <fence/>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? </clusternode>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <clusternode name="cvtst3" nodeid="3"
>>>>>>>>>>>>>>>>>> votes="1">
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? <fence/>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? </clusternode>
>>>>>>>>>>>>>>>>>> ? ? ? </clusternodes>
>>>>>>>>>>>>>>>>>> ? ? ? <cman/>
>>>>>>>>>>>>>>>>>> ? ? ? <fencedevices/>
>>>>>>>>>>>>>>>>>> ? ? ? <rm>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <failoverdomains>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? <failoverdomain name="myfd1"
>>>>>>>>>>>>>>>>>> nofailback="0" ordered="1" restricted="0">
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? <failoverdomainnode
>>>>>>>>>>>>>>>>>> name="cvtst2" priority="3"/>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? <failoverdomainnode
>>>>>>>>>>>>>>>>>> name="cvtst1" priority="1"/>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? <failoverdomainnode
>>>>>>>>>>>>>>>>>> name="cvtst3" priority="2"/>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? </failoverdomain>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? </failoverdomains>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <resources/>
>>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <vm autostart="1" domain="myfd1"
>>>>>>>>>>>>>>>>>> exclusive="0" max_restarts="0"
>>>>>>>>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>>>>>>>>>>>>>>>>>> estart" restart_expire_time="0"/>
>>>>>>>>>>>>>>>>>> ? ? ? </rm>
>>>>>>>>>>>>>>>>>> </cluster>
>>>>>>>>>>>>>>>>>> [root at cvtst1 cluster]#
>>>>>>>>>>>>>>>>>> ------
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>> Paras.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer
>>>>>>>>>>>>>>>>>> <volker at ixolution.de> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>>>>>>>>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines.
>>>>>>>>>>>>>>>>>>>> Now I am
>>>>>>>>>>>>>>>>>>>> having another problem. When I start the my xen vm in
>>>>>>>>>>>>>>>>>>>> one node, it
>>>>>>>>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the
>>>>>>>>>>>>>>>>>>> rgmanager
>>>>>>>>>>>>>>>>>>> package). To me, this sounds like a configuration
>>>>>>>>>>>>>>>>>>> problem. Maybe,
>>>>>>>>>>>>>>>>>>> you can post your cluster.conf?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>> Volker
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Linux-cluster mailing list
>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Wed Sep 30 14:58:25 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 30 Sep 2009 16:58:25 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909300749i2af6a711v2f866c55a046a388@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
	<4AC2986F.8050100@io-consulting.net>
	<8b711df40909300749i2af6a711v2f866c55a046a388@mail.gmail.com>
Message-ID: <29ae894c0909300758o26869e98ma9a1a211bff0f6bf@mail.gmail.com>




From brem.belguebli at gmail.com  Wed Sep 30 15:02:32 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 30 Sep 2009 17:02:32 +0200
Subject: [Linux-cluster] openais issue
In-Reply-To: <8b711df40909300749i2af6a711v2f866c55a046a388@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
	<4AC2986F.8050100@io-consulting.net>
	<8b711df40909300749i2af6a711v2f866c55a046a388@mail.gmail.com>
Message-ID: <29ae894c0909300802oa8d72c5k892115a3e2f67db9@mail.gmail.com>

Hi Paras,

Your cluster.conf file seems correct. If it is not a ntp issue, I
don't see anything except a bug that causes this, or some prerequisite
that is not respected.

May be you could post the versions (os, kernel, packages etc...) you
are using, someone may have hit the same issue with your versions.

Brem

2009/9/30, Paras pradhan <pradhanparas at gmail.com>:
> All of the nodes are synced with ntp server. So this is not the case with me.
>
> Thanks
> Paras.
>
> On Tue, Sep 29, 2009 at 6:29 PM, Johannes Ru?ek
> <johannes.russek at io-consulting.net> wrote:
> > make sure the time on the nodes is in sync, apparently when a node has too
> > much offset, you won't see rgmanager (even though the process is running).
> > this happened today and setting the time fixed it for me. afaicr there was
> > no sign of this in the logs though.
> > johannes
> >
> > Paras pradhan schrieb:
> >>
> >> I don't see rgmanager .
> >>
> >> Here is the o/p from clustat
> >>
> >> [root at cvtst1 cluster]# clustat
> >> Cluster Status for test @ Tue Sep 29 15:53:33 2009
> >> Member Status: Quorate
> >>
> >>  Member Name                                                     ID
> >> Status
> >>  ------ ----                                                     ----
> >> ------
> >>  cvtst2                                                    1 Online
> >>  cvtst1                                                     2 Online,
> >> Local
> >>  cvtst3                                                     3 Online
> >>
> >>
> >> Thanks
> >> Paras.
> >>
> >> On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
> >> <brem.belguebli at gmail.com> wrote:
> >>
> >>>
> >>> It looks correct, rgmanager seems to start on all nodes
> >>>
> >>> what gives you clustat ?
> >>>
> >>> If rgmanager doesn't show, check out the logs something may have gone
> >>> wrong.
> >>>
> >>>
> >>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
> >>>
> >>>>
> >>>> Change to 7 and i got this log
> >>>>
> >>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
> >>>> Cluster Service Manager...
> >>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
> >>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
> >>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete,
> >>>> exiting
> >>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
> >>>> Manager is stopped.
> >>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
> >>>> Manager Starting
> >>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
> >>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover
> >>>> Domains
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
> >>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
> >>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
> >>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
> >>>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
> >>>>
> >>>>
> >>>> Anything unusual here?
> >>>>
> >>>> Paras.
> >>>>
> >>>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
> >>>> <brem.belguebli at gmail.com> wrote:
> >>>>
> >>>>>
> >>>>> I use log_level=7 to have more debugging info.
> >>>>>
> >>>>> It seems 4 is not enough.
> >>>>>
> >>>>> Brem
> >>>>>
> >>>>>
> >>>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
> >>>>>
> >>>>>>
> >>>>>> Withe log_level of 3 I got only this
> >>>>>>
> >>>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
> >>>>>> Cluster Service Manager...
> >>>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
> >>>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete,
> >>>>>> exiting
> >>>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
> >>>>>> Manager is stopped.
> >>>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
> >>>>>> Manager Starting
> >>>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
> >>>>>> Cluster Service Manager...
> >>>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
> >>>>>> Manager is stopped.
> >>>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
> >>>>>> Manager Starting
> >>>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting
> >>>>>> down
> >>>>>>
> >>>>>> I do not know what the last line means.
> >>>>>>
> >>>>>> rgmanager version I am running is:
> >>>>>> rgmanager-2.0.52-1.el5.centos
> >>>>>>
> >>>>>> I don't what has gone wrong.
> >>>>>>
> >>>>>> Thanks
> >>>>>> Paras.
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
> >>>>>> <brem.belguebli at gmail.com> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> you mean it stopped successfully on all the nodes but it is failing
> >>>>>>> to
> >>>>>>> start only on node cvtst1 ?
> >>>>>>>
> >>>>>>> look at the following page  to make rgmanager more verbose. It 'll
> >>>>>>> help debug....
> >>>>>>>
> >>>>>>> http://sources.redhat.com/cluster/wiki/RGManager
> >>>>>>>
> >>>>>>> at Logging Configuration section
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Brem,
> >>>>>>>>
> >>>>>>>> When I try to restart rgmanager on all the nodes, this time i do not
> >>>>>>>> see rgmanager running on the first node. But I do see on other 2
> >>>>>>>> nodes.
> >>>>>>>>
> >>>>>>>> Log on the first node:
> >>>>>>>>
> >>>>>>>> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
> >>>>>>>> Manager Starting
> >>>>>>>> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
> >>>>>>>> Cluster Service Manager...
> >>>>>>>> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
> >>>>>>>> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete,
> >>>>>>>> exiting
> >>>>>>>> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
> >>>>>>>> Manager is stopped.
> >>>>>>>> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
> >>>>>>>> Manager Starting
> >>>>>>>>
> >>>>>>>> -
> >>>>>>>> It seems service is running ,  but I do not see rgmanger running
> >>>>>>>> using clustat
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Don't know what is going on.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Paras.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
> >>>>>>>> <brem.belguebli at gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Paras,
> >>>>>>>>>
> >>>>>>>>> Another thing, it would have been more interesting to have a start
> >>>>>>>>> DEBUG not a stop.
> >>>>>>>>>
> >>>>>>>>> That's why I was asking you to first stop the vm manually on all
> >>>>>>>>> your
> >>>>>>>>> nodes, stop eventually rgmanager on all the nodes to reset the
> >>>>>>>>> potential wrong states you may have, restart rgmanager.
> >>>>>>>>>
> >>>>>>>>> If your VM is configured to autostart, this will make it start.
> >>>>>>>>>
> >>>>>>>>> It should normally fail (as it does now). Send out your newly
> >>>>>>>>> created
> >>>>>>>>> DEBUG file.
> >>>>>>>>>
> >>>>>>>>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hi Paras,
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I don't know the xen/cluster combination well, but if I do
> >>>>>>>>>> remember
> >>>>>>>>>> well, I think I've read somewhere that when using xen you have to
> >>>>>>>>>> declare the use_virsh=0 key in the VM definition in the
> >>>>>>>>>> cluster.conf.
> >>>>>>>>>>
> >>>>>>>>>> This would make rgmanager use xm commands instead of virsh
> >>>>>>>>>> The DEBUG output shows clearly that you are using virsh to manage
> >>>>>>>>>> your
> >>>>>>>>>> VM instead of xm commands.
> >>>>>>>>>> Check out the RH docs about virtualization
> >>>>>>>>>>
> >>>>>>>>>> I'm not a 100% sure about that, I may be completely wrong.
> >>>>>>>>>>
> >>>>>>>>>> Brem
> >>>>>>>>>>
> >>>>>>>>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The only thing I noticed is the message after stopping the vm
> >>>>>>>>>>> using xm
> >>>>>>>>>>> in all nodes and starting using clusvcadm is
> >>>>>>>>>>>
> >>>>>>>>>>> "Virtual machine guest1 is blocked"
> >>>>>>>>>>>
> >>>>>>>>>>> The whole DEBUG file is attached.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> Paras.
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
> >>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> There's a problem with the script that is called by rgmanager to
> >>>>>>>>>>>> start
> >>>>>>>>>>>> the VM, I don't know what causes it
> >>>>>>>>>>>>
> >>>>>>>>>>>> May be you should try something like :
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1) stop the VM on all nodes with xm commands
> >>>>>>>>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the
> >>>>>>>>>>>> following
> >>>>>>>>>>>> lines (after the #!/bin/bash ):
> >>>>>>>>>>>>  exec >/tmp/DEBUG 2>&1
> >>>>>>>>>>>>  set -x
> >>>>>>>>>>>> 3) start the VM with clusvcadm -e vm:guest1
> >>>>>>>>>>>>
> >>>>>>>>>>>> It should fail as it did before.
> >>>>>>>>>>>>
> >>>>>>>>>>>> edit the the /tmp/DEBUG file and you will be able to see where
> >>>>>>>>>>>> it
> >>>>>>>>>>>> fails (it may generate a lot of debug)
> >>>>>>>>>>>>
> >>>>>>>>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
> >>>>>>>>>>>>
> >>>>>>>>>>>> Post the DEBUG file if you're not able to see where it fails.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Brem
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> No I am not manually starting not using automatic init scripts.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I started the vm using: clusvcadm -e vm:guest1
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few
> >>>>>>>>>>>>> seconds it
> >>>>>>>>>>>>> says guest1 started . But after a while I can see the guest1 on
> >>>>>>>>>>>>> all
> >>>>>>>>>>>>> three nodes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> clustat says:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  Service Name                                            Owner
> >>>>>>>>>>>>> (Last)
> >>>>>>>>>>>>>                                         State
> >>>>>>>>>>>>>  ------- ----                                            -----
> >>>>>>>>>>>>> ------
> >>>>>>>>>>>>>                                         -----
> >>>>>>>>>>>>>  vm:guest1                                               (none)
> >>>>>>>>>>>>>                                         stopped
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But I can see the vm from xm li.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This is what I can see from the log:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm
> >>>>>>>>>>>>> "guest1"
> >>>>>>>>>>>>> returned 1 (generic error)
> >>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed
> >>>>>>>>>>>>> to start
> >>>>>>>>>>>>> vm:guest1; return value: 1
> >>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping
> >>>>>>>>>>>>> service vm:guest1
> >>>>>>>>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service
> >>>>>>>>>>>>> vm:guest1 is
> >>>>>>>>>>>>> recovering
> >>>>>>>>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering
> >>>>>>>>>>>>> failed
> >>>>>>>>>>>>> service vm:guest1
> >>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm
> >>>>>>>>>>>>> "guest1"
> >>>>>>>>>>>>> returned 1 (generic error)
> >>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed
> >>>>>>>>>>>>> to start
> >>>>>>>>>>>>> vm:guest1; return value: 1
> >>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping
> >>>>>>>>>>>>> service vm:guest1
> >>>>>>>>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service
> >>>>>>>>>>>>> vm:guest1 is
> >>>>>>>>>>>>> recovering
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Paras.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
> >>>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Have you started  your VM via rgmanager (clusvcadm -e
> >>>>>>>>>>>>>> vm:guest1) or
> >>>>>>>>>>>>>> using xm commands out of cluster control  (or maybe a thru an
> >>>>>>>>>>>>>> automatic init script ?)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> When clustered, you should never be starting services
> >>>>>>>>>>>>>> (manually or
> >>>>>>>>>>>>>> thru automatic init script) out of cluster control
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The thing would be to stop your vm on all the nodes with the
> >>>>>>>>>>>>>> adequate
> >>>>>>>>>>>>>> xm command (not using xen myself) and try to start it with
> >>>>>>>>>>>>>> clusvcadm.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Then see if it is started on all nodes (send clustat output)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ok. Please see below. my vm is running on all nodes though
> >>>>>>>>>>>>>>> clustat
> >>>>>>>>>>>>>>> says it is stopped.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> [root at cvtst1 ~]# clustat
> >>>>>>>>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
> >>>>>>>>>>>>>>> Member Status: Quorate
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  Member Name
> >>>>>>>>>>>>>>>     ID   Status
> >>>>>>>>>>>>>>>  ------ ----
> >>>>>>>>>>>>>>>     ---- ------
> >>>>>>>>>>>>>>>  cvtst2                                                    1
> >>>>>>>>>>>>>>> Online, rgmanager
> >>>>>>>>>>>>>>>  cvtst1                                                     2
> >>>>>>>>>>>>>>> Online,
> >>>>>>>>>>>>>>> Local, rgmanager
> >>>>>>>>>>>>>>>  cvtst3                                                     3
> >>>>>>>>>>>>>>> Online, rgmanager
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  Service Name
> >>>>>>>>>>>>>>>  Owner (Last)
> >>>>>>>>>>>>>>>                                         State
> >>>>>>>>>>>>>>>  ------- ----
> >>>>>>>>>>>>>>>  ----- ------
> >>>>>>>>>>>>>>>                                         -----
> >>>>>>>>>>>>>>>  vm:guest1
> >>>>>>>>>>>>>>> (none)
> >>>>>>>>>>>>>>>                                         stopped
> >>>>>>>>>>>>>>> [root at cvtst1 ~]#
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>> o/p of xm li on cvtst1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> [root at cvtst1 ~]# xm li
> >>>>>>>>>>>>>>> Name                                      ID Mem(MiB) VCPUs
> >>>>>>>>>>>>>>> State   Time(s)
> >>>>>>>>>>>>>>> Domain-0                                   0     3470     2
> >>>>>>>>>>>>>>> r-----  28939.4
> >>>>>>>>>>>>>>> guest1                                     7      511     1
> >>>>>>>>>>>>>>> -b----   7727.8
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> o/p of xm li on cvtst2
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> [root at cvtst2 ~]# xm li
> >>>>>>>>>>>>>>> Name                                      ID Mem(MiB) VCPUs
> >>>>>>>>>>>>>>> State   Time(s)
> >>>>>>>>>>>>>>> Domain-0                                   0     3470     2
> >>>>>>>>>>>>>>> r-----  31558.9
> >>>>>>>>>>>>>>> guest1                                    21      511     1
> >>>>>>>>>>>>>>> -b----   7558.2
> >>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>> Paras.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
> >>>>>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> It looks like no.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> can you send an output of clustat  of when the VM is running
> >>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> multiple nodes at the same time?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> And by the way, another one after having stopped (clusvcadm
> >>>>>>>>>>>>>>>> -s vm:guest1) ?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not
> >>>>>>>>>>>>>>>>> being
> >>>>>>>>>>>>>>>>> properly handled by the cluster.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>>> Paras.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan
> >>>>>>>>>>>>>>>>> <pradhanparas at gmail.com> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Ok.. here is my cluster.conf file
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
> >>>>>>>>>>>>>>>>>> <?xml version="1.0"?>
> >>>>>>>>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
> >>>>>>>>>>>>>>>>>>       <fence_daemon clean_start="0" post_fail_delay="0"
> >>>>>>>>>>>>>>>>>> post_join_delay="3"/>
> >>>>>>>>>>>>>>>>>>       <clusternodes>
> >>>>>>>>>>>>>>>>>>               <clusternode name="cvtst2" nodeid="1"
> >>>>>>>>>>>>>>>>>> votes="1">
> >>>>>>>>>>>>>>>>>>                       <fence/>
> >>>>>>>>>>>>>>>>>>               </clusternode>
> >>>>>>>>>>>>>>>>>>               <clusternode name="cvtst1" nodeid="2"
> >>>>>>>>>>>>>>>>>> votes="1">
> >>>>>>>>>>>>>>>>>>                       <fence/>
> >>>>>>>>>>>>>>>>>>               </clusternode>
> >>>>>>>>>>>>>>>>>>               <clusternode name="cvtst3" nodeid="3"
> >>>>>>>>>>>>>>>>>> votes="1">
> >>>>>>>>>>>>>>>>>>                       <fence/>
> >>>>>>>>>>>>>>>>>>               </clusternode>
> >>>>>>>>>>>>>>>>>>       </clusternodes>
> >>>>>>>>>>>>>>>>>>       <cman/>
> >>>>>>>>>>>>>>>>>>       <fencedevices/>
> >>>>>>>>>>>>>>>>>>       <rm>
> >>>>>>>>>>>>>>>>>>               <failoverdomains>
> >>>>>>>>>>>>>>>>>>                       <failoverdomain name="myfd1"
> >>>>>>>>>>>>>>>>>> nofailback="0" ordered="1" restricted="0">
> >>>>>>>>>>>>>>>>>>                               <failoverdomainnode
> >>>>>>>>>>>>>>>>>> name="cvtst2" priority="3"/>
> >>>>>>>>>>>>>>>>>>                               <failoverdomainnode
> >>>>>>>>>>>>>>>>>> name="cvtst1" priority="1"/>
> >>>>>>>>>>>>>>>>>>                               <failoverdomainnode
> >>>>>>>>>>>>>>>>>> name="cvtst3" priority="2"/>
> >>>>>>>>>>>>>>>>>>                       </failoverdomain>
> >>>>>>>>>>>>>>>>>>               </failoverdomains>
> >>>>>>>>>>>>>>>>>>               <resources/>
> >>>>>>>>>>>>>>>>>>               <vm autostart="1" domain="myfd1"
> >>>>>>>>>>>>>>>>>> exclusive="0" max_restarts="0"
> >>>>>>>>>>>>>>>>>> name="guest1" path="/vms" recovery="r
> >>>>>>>>>>>>>>>>>> estart" restart_expire_time="0"/>
> >>>>>>>>>>>>>>>>>>       </rm>
> >>>>>>>>>>>>>>>>>> </cluster>
> >>>>>>>>>>>>>>>>>> [root at cvtst1 cluster]#
> >>>>>>>>>>>>>>>>>> ------
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>>>>>>> Paras.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer
> >>>>>>>>>>>>>>>>>> <volker at ixolution.de> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
> >>>>>>>>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines.
> >>>>>>>>>>>>>>>>>>>> Now I am
> >>>>>>>>>>>>>>>>>>>> having another problem. When I start the my xen vm in
> >>>>>>>>>>>>>>>>>>>> one node, it
> >>>>>>>>>>>>>>>>>>>> also starts on other nodes. Which daemon controls  this?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the
> >>>>>>>>>>>>>>>>>>> rgmanager
> >>>>>>>>>>>>>>>>>>> package). To me, this sounds like a configuration
> >>>>>>>>>>>>>>>>>>> problem. Maybe,
> >>>>>>>>>>>>>>>>>>> you can post your cluster.conf?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>> Volker
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Linux-cluster mailing list
> >>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Linux-cluster mailing list
> >>>>>>>> Linux-cluster at redhat.com
> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Linux-cluster mailing list
> >>>>>>> Linux-cluster at redhat.com
> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Linux-cluster mailing list
> >>>>>> Linux-cluster at redhat.com
> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Linux-cluster mailing list
> >>>>> Linux-cluster at redhat.com
> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Linux-cluster mailing list
> >>>> Linux-cluster at redhat.com
> >>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>
> >>>>
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>
> >>>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From lhh at redhat.com  Wed Sep 30 15:08:08 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 30 Sep 2009 11:08:08 -0400
Subject: AW: [Linux-cluster] Problems starting a VM Service
In-Reply-To: <22d801ca41d1$6e5ed740$4b1c85c0$@rosenberger@netbiscuits.com>
References: <22a301ca41c7$7e40e540$7ac2afc0$@rosenberger@netbiscuits.com>
	<1254316836.16878.18.camel@localhost.localdomain>
	<22d801ca41d1$6e5ed740$4b1c85c0$@rosenberger@netbiscuits.com>
Message-ID: <1254323288.14760.14.camel@localhost.localdomain>

On Wed, 2009-09-30 at 15:25 +0200, Rene Rosenberger wrote:
> Hi,
> 
> rgmanager-2.0.52-1
> 
> [root at cluster-node02 ~]# cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster alias="cluster01" config_version="16" name="cluster01">
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="cluster-node01.netbiscuits.com"
> nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="Fence_Device_01"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="cluster-node02.netbiscuits.com"
> nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="Fence_Device_02"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_ipmilan" ipaddr="192.168.100.141"
> login="root" name="Fence_Device_01" passwd="emoveo11wap"/>
>                 <fencedevice agent="fence_ipmilan" ipaddr="192.168.100.142"
> login="root" name="Fence_Device_02" passwd="emoveo11wap"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="Failover_Domain_01"
> nofailback="0" ordered="0" restricted="0">
>                                 <failoverdomainnode
> name="cluster-node01.netbiscuits.com" priority="1"/>
>                                 <failoverdomainnode
> name="cluster-node02.netbiscuits.com" priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources/>
>                 <vm autostart="1" domain="Failover_Domain_01" exclusive="0"
> migrate="live" name="Nagios" path="/rootfs/vm/" recovery="relocate"/>
>                 <vm autostart="1" domain="Failover_Domain_01" exclusive="0"
> migrate="live" name="Log-Server" path="/rootfs/vm/" recovery="relocate"/>
>         </rm>
> </cluster>
> 
> Regards, rene

Ok, so it's not the one fixed here:

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=a9ac1e635c559b46512cf4251fe71c015bb6d70f

I don't recall if this will matter much, but get rid of the trailing
slash on /rootfs/vm/.

Also, ensure /rootfs/vm/Nagios and /root/fs/vm/Log-Server file names
match the names contained within the respective config files.

(e.g.  name = "Nagios" / name = "Log-Server" )

This is because rgmanager wants a vm "name" but xm wants a "config file"
- so they have to match.

-- Lon



From pradhanparas at gmail.com  Wed Sep 30 15:11:23 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 30 Sep 2009 10:11:23 -0500
Subject: [Linux-cluster] openais issue
In-Reply-To: <29ae894c0909300802oa8d72c5k892115a3e2f67db9@mail.gmail.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com>
	<29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<8b711df40909290845x45e2a09aif37d6a3dd301de11@mail.gmail.com>
	<29ae894c0909290951v11a958e2k3a1aadce7f3b88e7@mail.gmail.com>
	<8b711df40909291337n2f26908dt363944c6238eb9f5@mail.gmail.com>
	<29ae894c0909291344l49a2a810t33582eb6c3932810@mail.gmail.com>
	<8b711df40909291354w55f92097wcdef691d0b239dee@mail.gmail.com>
	<4AC2986F.8050100@io-consulting.net>
	<8b711df40909300749i2af6a711v2f866c55a046a388@mail.gmail.com>
	<29ae894c0909300802oa8d72c5k892115a3e2f67db9@mail.gmail.com>
Message-ID: <8b711df40909300811q1724aa68hfea589bbb32b4ce5@mail.gmail.com>

Yes this is very strange. I don't know what to do now. May be re
create the cluster? But not a good solution actually.

Packages :

Kernel: kernel-xen-2.6.18-164.el5
OS: Full updated of CentOS 5.3 except CMAN downgraded to cman-2.0.98-1.el5

Other packages related to cluster suite:

rgmanager-2.0.52-1.el5.centos
cman-2.0.98-1.el5
xen-3.0.3-80.el5_3.3
xen-libs-3.0.3-80.el5_3.3
kmod-gfs-xen-0.1.31-3.el5_3.1
kmod-gfs-xen-0.1.31-3.el5_3.1
kmod-gfs-0.1.31-3.el5_3.1
gfs-utils-0.1.18-1.el5
gfs2-utils-0.1.62-1.el5
lvm2-2.02.40-6.el5
lvm2-cluster-2.02.40-7.el5
openais-0.80.3-22.el5_3.9

Thanks!
Paras.




On Wed, Sep 30, 2009 at 10:02 AM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> Hi Paras,
>
> Your cluster.conf file seems correct. If it is not a ntp issue, I
> don't see anything except a bug that causes this, or some prerequisite
> that is not respected.
>
> May be you could post the versions (os, kernel, packages etc...) you
> are using, someone may have hit the same issue with your versions.
>
> Brem
>
> 2009/9/30, Paras pradhan <pradhanparas at gmail.com>:
>> All of the nodes are synced with ntp server. So this is not the case with me.
>>
>> Thanks
>> Paras.
>>
>> On Tue, Sep 29, 2009 at 6:29 PM, Johannes Ru?ek
>> <johannes.russek at io-consulting.net> wrote:
>> > make sure the time on the nodes is in sync, apparently when a node has too
>> > much offset, you won't see rgmanager (even though the process is running).
>> > this happened today and setting the time fixed it for me. afaicr there was
>> > no sign of this in the logs though.
>> > johannes
>> >
>> > Paras pradhan schrieb:
>> >>
>> >> I don't see rgmanager .
>> >>
>> >> Here is the o/p from clustat
>> >>
>> >> [root at cvtst1 cluster]# clustat
>> >> Cluster Status for test @ Tue Sep 29 15:53:33 2009
>> >> Member Status: Quorate
>> >>
>> >> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID
>> >> Status
>> >> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----
>> >> ------
>> >> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online
>> >> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online,
>> >> Local
>> >> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3 Online
>> >>
>> >>
>> >> Thanks
>> >> Paras.
>> >>
>> >> On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
>> >> <brem.belguebli at gmail.com> wrote:
>> >>
>> >>>
>> >>> It looks correct, rgmanager seems to start on all nodes
>> >>>
>> >>> what gives you clustat ?
>> >>>
>> >>> If rgmanager doesn't show, check out the logs something may have gone
>> >>> wrong.
>> >>>
>> >>>
>> >>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> >>>
>> >>>>
>> >>>> Change to 7 and i got this log
>> >>>>
>> >>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
>> >>>> Cluster Service Manager...
>> >>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>> >>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
>> >>>> Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete,
>> >>>> exiting
>> >>>> Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
>> >>>> Manager is stopped.
>> >>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
>> >>>> Manager Starting
>> >>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
>> >>>> Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover
>> >>>> Domains
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
>> >>>> Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
>> >>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
>> >>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
>> >>>> Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
>> >>>> Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed
>> >>>>
>> >>>>
>> >>>> Anything unusual here?
>> >>>>
>> >>>> Paras.
>> >>>>
>> >>>> On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
>> >>>> <brem.belguebli at gmail.com> wrote:
>> >>>>
>> >>>>>
>> >>>>> I use log_level=7 to have more debugging info.
>> >>>>>
>> >>>>> It seems 4 is not enough.
>> >>>>>
>> >>>>> Brem
>> >>>>>
>> >>>>>
>> >>>>> 2009/9/29, Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>
>> >>>>>>
>> >>>>>> Withe log_level of 3 I got only this
>> >>>>>>
>> >>>>>> Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
>> >>>>>> Cluster Service Manager...
>> >>>>>> Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
>> >>>>>> Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete,
>> >>>>>> exiting
>> >>>>>> Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
>> >>>>>> Manager is stopped.
>> >>>>>> Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
>> >>>>>> Manager Starting
>> >>>>>> Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
>> >>>>>> Cluster Service Manager...
>> >>>>>> Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
>> >>>>>> Manager is stopped.
>> >>>>>> Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
>> >>>>>> Manager Starting
>> >>>>>> Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting
>> >>>>>> down
>> >>>>>>
>> >>>>>> I do not know what the last line means.
>> >>>>>>
>> >>>>>> rgmanager version I am running is:
>> >>>>>> rgmanager-2.0.52-1.el5.centos
>> >>>>>>
>> >>>>>> I don't what has gone wrong.
>> >>>>>>
>> >>>>>> Thanks
>> >>>>>> Paras.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
>> >>>>>> <brem.belguebli at gmail.com> wrote:
>> >>>>>>
>> >>>>>>>
>> >>>>>>> you mean it stopped successfully on all the nodes but it is failing
>> >>>>>>> to
>> >>>>>>> start only on node cvtst1 ?
>> >>>>>>>
>> >>>>>>> look at the following page ?to make rgmanager more verbose. It 'll
>> >>>>>>> help debug....
>> >>>>>>>
>> >>>>>>> http://sources.redhat.com/cluster/wiki/RGManager
>> >>>>>>>
>> >>>>>>> at Logging Configuration section
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 2009/9/29 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Brem,
>> >>>>>>>>
>> >>>>>>>> When I try to restart rgmanager on all the nodes, this time i do not
>> >>>>>>>> see rgmanager running on the first node. But I do see on other 2
>> >>>>>>>> nodes.
>> >>>>>>>>
>> >>>>>>>> Log on the first node:
>> >>>>>>>>
>> >>>>>>>> Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
>> >>>>>>>> Manager Starting
>> >>>>>>>> Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
>> >>>>>>>> Cluster Service Manager...
>> >>>>>>>> Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
>> >>>>>>>> Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete,
>> >>>>>>>> exiting
>> >>>>>>>> Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
>> >>>>>>>> Manager is stopped.
>> >>>>>>>> Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
>> >>>>>>>> Manager Starting
>> >>>>>>>>
>> >>>>>>>> -
>> >>>>>>>> It seems service is running , ?but I do not see rgmanger running
>> >>>>>>>> using clustat
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Don't know what is going on.
>> >>>>>>>>
>> >>>>>>>> Thanks
>> >>>>>>>> Paras.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
>> >>>>>>>> <brem.belguebli at gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Paras,
>> >>>>>>>>>
>> >>>>>>>>> Another thing, it would have been more interesting to have a start
>> >>>>>>>>> DEBUG not a stop.
>> >>>>>>>>>
>> >>>>>>>>> That's why I was asking you to first stop the vm manually on all
>> >>>>>>>>> your
>> >>>>>>>>> nodes, stop eventually rgmanager on all the nodes to reset the
>> >>>>>>>>> potential wrong states you may have, restart rgmanager.
>> >>>>>>>>>
>> >>>>>>>>> If your VM is configured to autostart, this will make it start.
>> >>>>>>>>>
>> >>>>>>>>> It should normally fail (as it does now). Send out your newly
>> >>>>>>>>> created
>> >>>>>>>>> DEBUG file.
>> >>>>>>>>>
>> >>>>>>>>> 2009/9/29 brem belguebli <brem.belguebli at gmail.com>:
>> >>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Hi Paras,
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I don't know the xen/cluster combination well, but if I do
>> >>>>>>>>>> remember
>> >>>>>>>>>> well, I think I've read somewhere that when using xen you have to
>> >>>>>>>>>> declare the use_virsh=0 key in the VM definition in the
>> >>>>>>>>>> cluster.conf.
>> >>>>>>>>>>
>> >>>>>>>>>> This would make rgmanager use xm commands instead of virsh
>> >>>>>>>>>> The DEBUG output shows clearly that you are using virsh to manage
>> >>>>>>>>>> your
>> >>>>>>>>>> VM instead of xm commands.
>> >>>>>>>>>> Check out the RH docs about virtualization
>> >>>>>>>>>>
>> >>>>>>>>>> I'm not a 100% sure about that, I may be completely wrong.
>> >>>>>>>>>>
>> >>>>>>>>>> Brem
>> >>>>>>>>>>
>> >>>>>>>>>> 2009/9/28 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> The only thing I noticed is the message after stopping the vm
>> >>>>>>>>>>> using xm
>> >>>>>>>>>>> in all nodes and starting using clusvcadm is
>> >>>>>>>>>>>
>> >>>>>>>>>>> "Virtual machine guest1 is blocked"
>> >>>>>>>>>>>
>> >>>>>>>>>>> The whole DEBUG file is attached.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks
>> >>>>>>>>>>> Paras.
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
>> >>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> There's a problem with the script that is called by rgmanager to
>> >>>>>>>>>>>> start
>> >>>>>>>>>>>> the VM, I don't know what causes it
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> May be you should try something like :
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 1) stop the VM on all nodes with xm commands
>> >>>>>>>>>>>> 2) edit the /usr/share/cluster/vm.sh script and add the
>> >>>>>>>>>>>> following
>> >>>>>>>>>>>> lines (after the #!/bin/bash ):
>> >>>>>>>>>>>> ?exec >/tmp/DEBUG 2>&1
>> >>>>>>>>>>>> ?set -x
>> >>>>>>>>>>>> 3) start the VM with clusvcadm -e vm:guest1
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> It should fail as it did before.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> edit the the /tmp/DEBUG file and you will be able to see where
>> >>>>>>>>>>>> it
>> >>>>>>>>>>>> fails (it may generate a lot of debug)
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 4) remove the debug lines from /usr/share/cluster/vm.sh
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Post the DEBUG file if you're not able to see where it fails.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Brem
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 2009/9/26 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> No I am not manually starting not using automatic init scripts.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I started the vm using: clusvcadm -e vm:guest1
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I have just stopped using clusvcadm -s vm:guest1. For few
>> >>>>>>>>>>>>> seconds it
>> >>>>>>>>>>>>> says guest1 started . But after a while I can see the guest1 on
>> >>>>>>>>>>>>> all
>> >>>>>>>>>>>>> three nodes.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> clustat says:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Owner
>> >>>>>>>>>>>>> (Last)
>> >>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? State
>> >>>>>>>>>>>>> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----
>> >>>>>>>>>>>>> ------
>> >>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -----
>> >>>>>>>>>>>>> ?vm:guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (none)
>> >>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? stopped
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> But I can see the vm from xm li.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> This is what I can see from the log:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm
>> >>>>>>>>>>>>> "guest1"
>> >>>>>>>>>>>>> returned 1 (generic error)
>> >>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed
>> >>>>>>>>>>>>> to start
>> >>>>>>>>>>>>> vm:guest1; return value: 1
>> >>>>>>>>>>>>> Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping
>> >>>>>>>>>>>>> service vm:guest1
>> >>>>>>>>>>>>> Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service
>> >>>>>>>>>>>>> vm:guest1 is
>> >>>>>>>>>>>>> recovering
>> >>>>>>>>>>>>> Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering
>> >>>>>>>>>>>>> failed
>> >>>>>>>>>>>>> service vm:guest1
>> >>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm
>> >>>>>>>>>>>>> "guest1"
>> >>>>>>>>>>>>> returned 1 (generic error)
>> >>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed
>> >>>>>>>>>>>>> to start
>> >>>>>>>>>>>>> vm:guest1; return value: 1
>> >>>>>>>>>>>>> Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping
>> >>>>>>>>>>>>> service vm:guest1
>> >>>>>>>>>>>>> Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service
>> >>>>>>>>>>>>> vm:guest1 is
>> >>>>>>>>>>>>> recovering
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Paras.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
>> >>>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Have you started ?your VM via rgmanager (clusvcadm -e
>> >>>>>>>>>>>>>> vm:guest1) or
>> >>>>>>>>>>>>>> using xm commands out of cluster control ?(or maybe a thru an
>> >>>>>>>>>>>>>> automatic init script ?)
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> When clustered, you should never be starting services
>> >>>>>>>>>>>>>> (manually or
>> >>>>>>>>>>>>>> thru automatic init script) out of cluster control
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The thing would be to stop your vm on all the nodes with the
>> >>>>>>>>>>>>>> adequate
>> >>>>>>>>>>>>>> xm command (not using xen myself) and try to start it with
>> >>>>>>>>>>>>>> clusvcadm.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Then see if it is started on all nodes (send clustat output)
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Ok. Please see below. my vm is running on all nodes though
>> >>>>>>>>>>>>>>> clustat
>> >>>>>>>>>>>>>>> says it is stopped.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>> [root at cvtst1 ~]# clustat
>> >>>>>>>>>>>>>>> Cluster Status for test @ Fri Sep 25 16:52:34 2009
>> >>>>>>>>>>>>>>> Member Status: Quorate
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> ?Member Name
>> >>>>>>>>>>>>>>> ? ? ID ? Status
>> >>>>>>>>>>>>>>> ?------ ----
>> >>>>>>>>>>>>>>> ? ? ---- ------
>> >>>>>>>>>>>>>>> ?cvtst2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1
>> >>>>>>>>>>>>>>> Online, rgmanager
>> >>>>>>>>>>>>>>> ?cvtst1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2
>> >>>>>>>>>>>>>>> Online,
>> >>>>>>>>>>>>>>> Local, rgmanager
>> >>>>>>>>>>>>>>> ?cvtst3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3
>> >>>>>>>>>>>>>>> Online, rgmanager
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> ?Service Name
>> >>>>>>>>>>>>>>> ?Owner (Last)
>> >>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? State
>> >>>>>>>>>>>>>>> ?------- ----
>> >>>>>>>>>>>>>>> ?----- ------
>> >>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -----
>> >>>>>>>>>>>>>>> ?vm:guest1
>> >>>>>>>>>>>>>>> (none)
>> >>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? stopped
>> >>>>>>>>>>>>>>> [root at cvtst1 ~]#
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> ---
>> >>>>>>>>>>>>>>> o/p of xm li on cvtst1
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>> [root at cvtst1 ~]# xm li
>> >>>>>>>>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs
>> >>>>>>>>>>>>>>> State ? Time(s)
>> >>>>>>>>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2
>> >>>>>>>>>>>>>>> r----- ?28939.4
>> >>>>>>>>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7 ? ? ?511 ? ? 1
>> >>>>>>>>>>>>>>> -b---- ? 7727.8
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> o/p of xm li on cvtst2
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>> [root at cvtst2 ~]# xm li
>> >>>>>>>>>>>>>>> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs
>> >>>>>>>>>>>>>>> State ? Time(s)
>> >>>>>>>>>>>>>>> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? 3470 ? ? 2
>> >>>>>>>>>>>>>>> r----- ?31558.9
>> >>>>>>>>>>>>>>> guest1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?21 ? ? ?511 ? ? 1
>> >>>>>>>>>>>>>>> -b---- ? 7558.2
>> >>>>>>>>>>>>>>> ---
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Thanks
>> >>>>>>>>>>>>>>> Paras.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
>> >>>>>>>>>>>>>>> <brem.belguebli at gmail.com> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> It looks like no.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> can you send an output of clustat ?of when the VM is running
>> >>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>> multiple nodes at the same time?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> And by the way, another one after having stopped (clusvcadm
>> >>>>>>>>>>>>>>>> -s vm:guest1) ?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> 2009/9/25 Paras pradhan <pradhanparas at gmail.com>:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Anyone having issue as mine? Virtual machine service is not
>> >>>>>>>>>>>>>>>>> being
>> >>>>>>>>>>>>>>>>> properly handled by the cluster.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thanks
>> >>>>>>>>>>>>>>>>> Paras.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan
>> >>>>>>>>>>>>>>>>> <pradhanparas at gmail.com> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Ok.. here is my cluster.conf file
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>> [root at cvtst1 cluster]# more cluster.conf
>> >>>>>>>>>>>>>>>>>> <?xml version="1.0"?>
>> >>>>>>>>>>>>>>>>>> <cluster alias="test" config_version="9" name="test">
>> >>>>>>>>>>>>>>>>>> ? ? ? <fence_daemon clean_start="0" post_fail_delay="0"
>> >>>>>>>>>>>>>>>>>> post_join_delay="3"/>
>> >>>>>>>>>>>>>>>>>> ? ? ? <clusternodes>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <clusternode name="cvtst2" nodeid="1"
>> >>>>>>>>>>>>>>>>>> votes="1">
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? <fence/>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? </clusternode>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <clusternode name="cvtst1" nodeid="2"
>> >>>>>>>>>>>>>>>>>> votes="1">
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? <fence/>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? </clusternode>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <clusternode name="cvtst3" nodeid="3"
>> >>>>>>>>>>>>>>>>>> votes="1">
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? <fence/>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? </clusternode>
>> >>>>>>>>>>>>>>>>>> ? ? ? </clusternodes>
>> >>>>>>>>>>>>>>>>>> ? ? ? <cman/>
>> >>>>>>>>>>>>>>>>>> ? ? ? <fencedevices/>
>> >>>>>>>>>>>>>>>>>> ? ? ? <rm>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <failoverdomains>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? <failoverdomain name="myfd1"
>> >>>>>>>>>>>>>>>>>> nofailback="0" ordered="1" restricted="0">
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? <failoverdomainnode
>> >>>>>>>>>>>>>>>>>> name="cvtst2" priority="3"/>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? <failoverdomainnode
>> >>>>>>>>>>>>>>>>>> name="cvtst1" priority="1"/>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? <failoverdomainnode
>> >>>>>>>>>>>>>>>>>> name="cvtst3" priority="2"/>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? ? ? ? ? </failoverdomain>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? </failoverdomains>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <resources/>
>> >>>>>>>>>>>>>>>>>> ? ? ? ? ? ? ? <vm autostart="1" domain="myfd1"
>> >>>>>>>>>>>>>>>>>> exclusive="0" max_restarts="0"
>> >>>>>>>>>>>>>>>>>> name="guest1" path="/vms" recovery="r
>> >>>>>>>>>>>>>>>>>> estart" restart_expire_time="0"/>
>> >>>>>>>>>>>>>>>>>> ? ? ? </rm>
>> >>>>>>>>>>>>>>>>>> </cluster>
>> >>>>>>>>>>>>>>>>>> [root at cvtst1 cluster]#
>> >>>>>>>>>>>>>>>>>> ------
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Thanks!
>> >>>>>>>>>>>>>>>>>> Paras.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer
>> >>>>>>>>>>>>>>>>>> <volker at ixolution.de> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Fri, Sep 18, 2009 at 05:08:57PM -0500,
>> >>>>>>>>>>>>>>>>>>> Paras pradhan <pradhanparas at gmail.com> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> I am using cluster suite for HA of xen virtual machines.
>> >>>>>>>>>>>>>>>>>>>> Now I am
>> >>>>>>>>>>>>>>>>>>>> having another problem. When I start the my xen vm in
>> >>>>>>>>>>>>>>>>>>>> one node, it
>> >>>>>>>>>>>>>>>>>>>> also starts on other nodes. Which daemon controls ?this?
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> This is usually done bei clurgmgrd (which is part of the
>> >>>>>>>>>>>>>>>>>>> rgmanager
>> >>>>>>>>>>>>>>>>>>> package). To me, this sounds like a configuration
>> >>>>>>>>>>>>>>>>>>> problem. Maybe,
>> >>>>>>>>>>>>>>>>>>> you can post your cluster.conf?
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>>>>>>> Volker
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> --
>> >>>>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Linux-cluster mailing list
>> >>>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Linux-cluster mailing list
>> >>>>>>>> Linux-cluster at redhat.com
>> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Linux-cluster mailing list
>> >>>>>>> Linux-cluster at redhat.com
>> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Linux-cluster mailing list
>> >>>>>> Linux-cluster at redhat.com
>> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Linux-cluster mailing list
>> >>>>> Linux-cluster at redhat.com
>> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> --
>> >>>> Linux-cluster mailing list
>> >>>> Linux-cluster at redhat.com
>> >>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Linux-cluster mailing list
>> >>> Linux-cluster at redhat.com
>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>>
>> >>>
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From r.rosenberger at netbiscuits.com  Wed Sep 30 15:47:27 2009
From: r.rosenberger at netbiscuits.com (Rene Rosenberger)
Date: Wed, 30 Sep 2009 17:47:27 +0200
Subject: AW: AW: [Linux-cluster] Problems starting a VM Service
In-Reply-To: <1254323288.14760.14.camel@localhost.localdomain>
References: <22a301ca41c7$7e40e540$7ac2afc0$@rosenberger@netbiscuits.com>	
	<1254316836.16878.18.camel@localhost.localdomain>	
	<22d801ca41d1$6e5ed740$4b1c85c0$@rosenberger@netbiscuits.com>
	<1254323288.14760.14.camel@localhost.localdomain>
Message-ID: <232b01ca41e5$4fb94aa0$ef2bdfe0$@rosenberger@netbiscuits.com>

-----Urspr?ngliche Nachricht-----
Von: Lon Hohberger [mailto:lhh at redhat.com] 
Gesendet: Mittwoch, 30. September 2009 17:08
An: r.rosenberger at netbiscuits.com
Cc: 'linux clustering'
Betreff: Re: AW: [Linux-cluster] Problems starting a VM Service

On Wed, 2009-09-30 at 15:25 +0200, Rene Rosenberger wrote:
> Hi,
> 
> rgmanager-2.0.52-1
> 
> [root at cluster-node02 ~]# cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster alias="cluster01" config_version="16" name="cluster01">
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="cluster-node01.netbiscuits.com"
> nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="Fence_Device_01"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="cluster-node02.netbiscuits.com"
> nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="Fence_Device_02"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_ipmilan"
ipaddr="192.168.100.141"
> login="root" name="Fence_Device_01" passwd="emoveo11wap"/>
>                 <fencedevice agent="fence_ipmilan"
ipaddr="192.168.100.142"
> login="root" name="Fence_Device_02" passwd="emoveo11wap"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="Failover_Domain_01"
> nofailback="0" ordered="0" restricted="0">
>                                 <failoverdomainnode
> name="cluster-node01.netbiscuits.com" priority="1"/>
>                                 <failoverdomainnode
> name="cluster-node02.netbiscuits.com" priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources/>
>                 <vm autostart="1" domain="Failover_Domain_01"
exclusive="0"
> migrate="live" name="Nagios" path="/rootfs/vm/" recovery="relocate"/>
>                 <vm autostart="1" domain="Failover_Domain_01"
exclusive="0"
> migrate="live" name="Log-Server" path="/rootfs/vm/" recovery="relocate"/>
>         </rm>
> </cluster>
> 
> Regards, rene

Ok, so it's not the one fixed here:

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=a9ac1e635c559b4651
2cf4251fe71c015bb6d70f

I don't recall if this will matter much, but get rid of the trailing
slash on /rootfs/vm/.

Also, ensure /rootfs/vm/Nagios and /root/fs/vm/Log-Server file names
match the names contained within the respective config files.

(e.g.  name = "Nagios" / name = "Log-Server" )

This is because rgmanager wants a vm "name" but xm wants a "config file"
- so they have to match.

-- Lon


Hi,

The file names are ok. The vm service is started by luci but cluci and
clustat are telling me "stopped". So i dont think it has something todo with
trailing slashes or filenames.

Regards,

rene




From kai at fiber.net  Wed Sep 30 16:36:35 2009
From: kai at fiber.net (Kai Meyer)
Date: Wed, 30 Sep 2009 10:36:35 -0600
Subject: [Linux-cluster] GFS2 fatal: invalid metadata block
In-Reply-To: <1254302681.2686.23.camel@localhost.localdomain>
References: <4AB4BD8E.9060905@fiber.net>	<1253537628.6052.274.camel@localhost.localdomain>	<4ABD0C74.7030705@fiber.net>	<1254129344.6052.399.camel@localhost.localdomain>	<4AC26606.3070401@fiber.net>
	<1254302681.2686.23.camel@localhost.localdomain>
Message-ID: <4AC38913.6060904@fiber.net>

Steven Whitehouse wrote:
> Hi,
>
> On Tue, 2009-09-29 at 13:54 -0600, Kai Meyer wrote:
>   
>> Steven Whitehouse wrote:
>>     
>>> Hi,
>>>
>>> You seem to have a number of issues here....
>>>
>>> On Fri, 2009-09-25 at 12:31 -0600, Kai Meyer wrote:
>>>   
>>>       
>>>> Sorry for the slow response. We ended up (finally) purchasing some RHEL 
>>>> licenses to try and get some phone support for this problem, and came up 
>>>> with a plan to salvage what we could. I'll try to offer a brief history 
>>>> of the problem in hope you can help me understand this issue a little 
>>>> better.
>>>> I've posted the relevant logfile entries to the events described here : 
>>>> http://kai.gnukai.com/gfs2_meltdown.txt
>>>> All the nodes send syslog to a remote server named pxe, so the combined 
>>>> syslog for all the nodes plus the syslog server is here: 
>>>> http://kai.gnukai.com/messages.txt
>>>> We started with a 4 node cluster (nodes 1, 2, 4, 5). The GFS2 filesystem 
>>>> was created with the latest CentOS 5.3 had to offer when it was 
>>>> released. Node 3 was off at the time the errors occurred, and not part 
>>>> of the cluster.
>>>> First issue I can recover from syslog is from node 5 (192.168.100.105) 
>>>> on Sep 8 14:11:27 was a 'fatal: invalid metadata block' error that 
>>>> resulted in the file system being withdrawn.
>>>>     
>>>>         
>>> Ok. So lets start with that message. Once that message has appeared, it
>>> means that something on disk has been corrupted. The only way in which
>>> that can be fixed is to unmount on all nodes and run fsck.gfs2 on the
>>> filesystem. The other nodes will only carry on working until they too
>>> read the same erroneous block.
>>>
>>> These issues are usually very tricky to track down. The main reason for
>>> that is that the event which caused the corruption is usually long in
>>> the past before the issue is discovered. Often there has been so much
>>> activity that its impossible to attribute it to any particular event.
>>>
>>> That said, we are very interested to receive reports of such corruption
>>> in case we can figure out the common factors between such reports.
>>>
>>>   
>>>       
>> Is there any more information I can provide that would be useful? At 
>> this point, I don't have the old disk array anymore. Once the data was 
>> recovered (as far as it was possible), the boss had me run smart checks 
>> on the disks, and then he re-sold them to a customer.
>>     
>
> There are a number of useful bits of info which we tend to ask for to
> try and narrow down such issues, these include:
>
> 1. How was the filesystem created?
>  - Was it with mkfs.gfs2 or an upgrade from a GFS2 filesystem
>   
It was created with mkfs.gfs2
>  - Was it grown with gfs2_grow at any stage?
>   
Nope
> 2. Recovery
>  - Was a failed node or node(s) recovered at some stage since the fs was
> created?
>   
Nodes in the cluster never failed due to this bug. The filesystem would 
withdraw, but the OS would continue to list the filesystem as mounted, 
therefore I could never leave the fence domain gracefully, nor do a 
graceful reboot. 'service cman stop' would always fail. If I killed off 
the fenced process the node would get fenced, but then I would have to 
remove the /etc/fstab entry for the gfs2 filesystem in order for the 
server to do a graceful reboot. I feel lucky that all my servers run 
diskless with a read only NFS root, so getting a node back into the 
cluster was a matter of a hard reboot. The only filesystems mounted on 
the nodes are over NFS and GFS2. With GFS2 withdrawn, there are no 
filesystems left to corrupt if I simply shutdown the server.
>  - What kind of fencing was used?
>   
We already had an API for managing ports and vlans on our cisco routers, 
so we wrote a short bash script that would shutdown the port specified 
in the cluster.conf that we configured to be dedicated to the iSCSI 
connection to the SAN. The fencing worked great every time, meaning once 
a server was fenced, it's ability to connect ot the SAN was severed on 
the router at the switch level. However, fencing was never triggered 
until I power cycled a server. Nodes remained in the cluster after the 
filesystem was withdrawn.
> 3. General usage pattern
>  - What applications were running?
>   
Only xen HVM with vanilla loopback drivers for the sparse disk images.
>  - What kind of files were in use (large/small) ?
>   
The sparse disk images start around 1-3GB and some were as large as 
50GB. The small plain text xen configuration files were also on the GFS2 
filesystem.
>  - How were the files arranged? (all in one directory, a few directories
> or many directories)
>   
At the root of the GFS2 filesystem, there was a folder for each VM, each 
folder containing sparse disk image files and one plain text 
configuration file.
>  - Was the usage heavy or light?
>   
I'm not sure how it compares to others. The throughput is around 6MB/sec 
(bytes not bits) as reported from /proc/diskstats, and it correlates 
nearly exactly with bandwidth graphs.
>  - Was the fs using quota?
>   
No, we mount it with the quota=off option.
>  - Was the system using selinux? (even if not in enforcing mode)
>   
It's currently set to 'disabled'.
> 4. Hardware
>  - What was the array in use? (make/model)
>   
Promise VTrak M610i
>  - How was it configured? (RAID level)
>   
14 disk RAID10
>  - How was it connected to the nodes? (fibre channel, AoE, etc)
>   
iSCSI
> 5. Manual intervention
>  - Was fsck.gfs2 run on the filesystem at any stage?
>   
After we moved all the data off we could, we ran gfs2_fsck 3 times on 
the entire array. All three times failed to remove the lock_dlm setting.
>  - Did it find/repair any problems? (if so, what?)
>   
I have a typescript log floating around on one of the test servers I 
think. I it didn't seem to show any errors.
>  - Were there any log messages which stuck you as odd?
>   
I've posted all the syslogmessages in a previous post. Nothing looked 
terribly fishy outside of what we've discussed.
>  - Did you use manual fencing at any time? (not recommended, but
> possible)
>   
No manual fencing.
>  - Did you notice any operations which seemed to run unusually
> fast/slow?
>   
Nothing outstanding.
> I do realise that in many cases there will be only partial information
> for a lot of the above questions, but thats the kind of information that
> is very helpful to us in figuring these things out.
>
>   
>>> The current behaviour of withdrawing a node in the event of a disk error
>>> is not ideal. In reality there is often little other choice though, as
>>> letting the node continue to operate risks possible greater corruption
>>> of data due to the potential for it to be working on incorrect data from
>>> the original problem.
>>>
>>> On recent upstream kernels we've tried to be a bit better about handling
>>> such errors by turning off use of individual resource groups in some
>>> cases, so that at least some filesystem activity can carry on.
>>>
>>>   
>>>       
>> Is there a bug or something I can follow to see updates on this issue?
>>
>>     
> There is bz #519049, there are a couple of others which might possibly
> be the same thing, but might just as easily be configuration issues with
> faulty fencing.
>   
Thanks, I'll try to keep an eye on it. I really appreciate the time 
you've put in to help me understand what's going on.

>   
>>>> Next was node 4 (192.168.100.104) to hit a 'fatal: filesystem 
>>>> consistency error' that also resulted in the file system being 
>>>> withdrawn. On the systems themselves, any attempt to access the 
>>>> filesystem would result in a I/O error response. At the prospect of 
>>>> rebooting 2 of the 4 nodes in my cluster, I brought node 3 
>>>> (192.168.100.103) online first. Then I power cycled nodes 4 and 5 one at 
>>>> a time and let them come back online. These nodes are running Xen, so I 
>>>> start to bring the VMs that were on nodes 4 and 5 online on nodes 3-5 
>>>> after all 3 had joined the cluster.
>>>> Shortly thereafter, node 3 encounters the 'fatal: invalid metadata 
>>>> block', and withdraws the file system. Then node 2 (.102) encounters 
>>>> 'fatal: invalid metadata block' also, and withdraws the filesystem. So I 
>>>> reboot them.
>>>> During their reboot, nodes 1 (.101) and 5 hits the same 'fatal: invalid 
>>>> metadata block' error. I waited for nodes 2 and 3 to come back online to 
>>>> preserve the cluster. At this point, node 4 was the only node that still 
>>>> had the filesystem mounted. After I had rebooted the other 4 nodes, none 
>>>> of them could mount the files system after joining the cluster, and node 
>>>> 4 was spinning on the error:
>>>> Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
>>>> fsid=xencluster1:xenclusterfs1.0: jid=4: Trying to acquire journal lock...
>>>> Sep  8 16:54:22 192.168.100.104 kernel: GFS2: 
>>>> fsid=xencluster1:xenclusterfs1.0: jid=4: Busy
>>>> It wasn't until this point that we suspected the SAN. We discovered that 
>>>> the SAN had marked a drive as "failed" but did not remove it from the 
>>>> array and begin to rebuild on the hot spare. When we physically removed 
>>>> the failed drive, the hot spare was picked up and put into the array.
>>>> The VMs on node 4 were the only ones "running" but they had all crashed 
>>>> because their disk was unavailable. I decided to reboot all the nodes to 
>>>> try and re-establish the cluster. We were able to get all the VMs turned 
>>>> back on, and we thought we were out of the dark, with the exception of 
>>>> the high level of filesystem corruption we caused inside 30% of the VM's 
>>>> filesystems. We ran them through their ext3 filesystem checks, and got 
>>>> them all running again.
>>>>
>>>>     
>>>>         
>>> ext3 or gfs2? I assume you mean the latter
>>>
>>>   
>>>       
>> I did mean ext3. The filesystems I was running fsck on were inside each 
>> individual VM's disk image. At this point, we had not attempted a gfs2_fsck.
>>     
> Ah, now I see. Sorry I didn't follow that the first time.
>
>   
>>>> Then at the time I send the original email, we were encountering the 
>>>> same invalid metadata block errors on the VMs at different points.
>>>>
>>>> With Redhat on the phone, we decided to migrate as much data as we could 
>>>> from the original production SAN to a new SAN, and bring the VMs online 
>>>> on the new SAN. There were a total of 3 VM disk images that would not 
>>>> copy because they would trigger the invalid metadata block error every 
>>>> time. After the migration, we tried 3 filesystem checks, all of which 
>>>> failed, leaving the fsck_dlm mechanism configured on the filesystem. We 
>>>> were able to override the lock with the instructions here:
>>>> http://kbase.redhat.com/faq/docs/DOC-17402
>>>>
>>>>     
>>>>         
>>> Was that reported as a bugzilla? fsck.gfs2 should certainly not fail int
>>> that way. Although, bearing in mind what you've said about bad hardware,
>>> that might be the reason. 
>>>
>>>   
>>>       
>> I didn't do any reporting via bugzilla. Redhat tech support intimated 
>> that a bug report from CentOS servers wouldn't get much attention. 
>> Another reason we are very interested in moving to RHEL 5.4.
>>     
> Well its not going to get as much attention as a RHEL bug, but all
> reports are useful. It may give us a hint which we'd not otherwise have
> and sometimes the only way to solve an issue is to look at lots of
> reports to find the common factors. So please don't let that put you off
> reporting it,
>
> Steve.
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

Ok, if I end up being able to find the fsck typescript log, I'll create 
a bug, and include all the information I've included in here so far.

Thanks so much for your attention.

-Kai Meyer



From Nitin.Choudhary at palm.com  Wed Sep 30 16:53:52 2009
From: Nitin.Choudhary at palm.com (Nitin Choudhary)
Date: Wed, 30 Sep 2009 09:53:52 -0700
Subject: [Linux-cluster] Dell iDRAC 6 Support for fencing device
In-Reply-To: <6683EA6E3C0A4B0E8881E698DDA3AD3D@homeuser>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com><29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com><8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com><29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com><8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com><29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com><8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com><29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com><29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com><8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com><29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>
	<DD073C87D7A2E54CA9235E8382F20A5C0B44E8AB6F@ushqwmb01.palm1.palmone.com>
	<6683EA6E3C0A4B0E8881E698DDA3AD3D@homeuser>
Message-ID: <DD073C87D7A2E54CA9235E8382F20A5C0B44F8631B@ushqwmb01.palm1.palmone.com>

Hi!

With small modification to fence_drac script it is working now. 

Thanks,

Nitin 


-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Louis
Sent: Tuesday, September 29, 2009 6:16 PM
To: linux clustering
Subject: Re: [Linux-cluster] Dell iDRAC 6 Support for fencing device

Hi,

I used ipmilan to bypass the iDREC6 fencing.

<fencedevice agent="fence_ipmilan" ipaddr="10.10.10.10" login="xxxxx" 
name="xxxxx" passwd="yyyy">


Regards
Louis
----- Original Message ----- 
From: "Nitin Choudhary" <Nitin.Choudhary at palm.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Tuesday, September 29, 2009 1:18 PM
Subject: [Linux-cluster] Dell iDRAC 6 Support for fencing device


>
> Hi!
>
> It seems that iDREC6 is not supported as fencing devices.
>
> Has anyone setup this before. Is there any workaround for this.
>
> Thanks,
>
> Nitin
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> 


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From mguazzardo76 at gmail.com  Wed Sep 30 17:31:27 2009
From: mguazzardo76 at gmail.com (Marcelo Guazzardo)
Date: Wed, 30 Sep 2009 14:31:27 -0300
Subject: [Linux-cluster] problem relocate service
In-Reply-To: <1254317067.16878.22.camel@localhost.localdomain>
References: <2b86aff20909281748g242355b6m378d75e1386ea0e@mail.gmail.com>
	<1254317067.16878.22.camel@localhost.localdomain>
Message-ID: <2b86aff20909301031l5e3f3c6cy61b5c561f193bfe9@mail.gmail.com>

Thank you Lon

I understood fine how works rmanager in fence situation. I must wait finish
fence , to see the service relocation, rigth?
Thanks in advance.

Marcelo

2009/9/30 Lon Hohberger <lhh at redhat.com>

> On Mon, 2009-09-28 at 21:48 -0300, Marcelo Guazzardo wrote:
> > Hi I have a RED-HAT 5.3 and configured cluster RHCS on two nodes.
> > Everythings works ok,services are relocated when one of the active
> > nodes shutdown or reboot.
> > But I have a problem when I unpluded power cord of the active node.
> > The resources cannot be relocate, because the active node is down.
> >
> > after, the other node fence the fail node, and after a few minutes,
> > the two nodes is up again. but, in this while, the service is
> > unrecheable. (the service never was relocated).
> > how can i do to relocate the service?.
>
> You'll want to look @ 'group_tool ls' output to before and after fencing
> is complete.
>
> rgmanager performs failover after fencing completes.  After that, you
> should be able to relocate/start/stop at will.
>
> Ensure you do not have multiple services with 'exclusive' set to 1.
>
> -- Lon
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Marcelo Guazzardo
mguazzardo76 at gmail.com
http://mguazzardo.blogspot.com
"Y vos y yo consumidores de basura,
acrecentamos el poder de esta gente,
que nos imponen las reglas de este juego, y rien al vernos caer, en su
trampa demente"
Usando Linux desde el 97
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090930/185318d8/attachment.htm>

From edgar at edgar-matzinger.nl  Wed Sep 30 20:36:09 2009
From: edgar at edgar-matzinger.nl (Edgar Matzinger)
Date: Wed, 30 Sep 2009 22:36:09 +0200
Subject: [Linux-cluster] Meaning resource script' exit code....
Message-ID: <1254342969.7753.5@Unknown-00-11-24-8c-d7-a3.lan>

Hello list,

  I wonder: What meaning do the resource scripts' exit codes have for
Red Hat CS? I know that if the script has exit code 1, the service
doesn't start. With exit code 0, it does. Are there any exit codes that
have the service start on another node? Oh, I'm running CS on CentOS
5.3.

Thanks, cu l8r, Edgar.
-- 
                                                        ''~``
                                                       ( o o )
+-------------------------------------------------.oooO--(_)--Oooo.---+




From andrew at ntsg.umt.edu  Wed Sep 30 22:47:35 2009
From: andrew at ntsg.umt.edu (Andrew A. Neuschwander)
Date: Wed, 30 Sep 2009 16:47:35 -0600
Subject: [Linux-cluster] Dell iDRAC 6 Support for fencing device
In-Reply-To: <DD073C87D7A2E54CA9235E8382F20A5C0B44F8631B@ushqwmb01.palm1.palmone.com>
References: <8b711df40909161052i41f9c007na85cf2ef2919ae0f@mail.gmail.com><29ae894c0909251422h584b6f44hbb90e45fabe689d8@mail.gmail.com><8b711df40909251455m34ceb268q4ca54f65a9a0bdd3@mail.gmail.com><29ae894c0909251507g3c6bd665j8cb379125f0f83b8@mail.gmail.com><8b711df40909251524w3dba10ddr9a8fbce6544f5c42@mail.gmail.com><29ae894c0909251553u209abddane9b48a4f5390c6b7@mail.gmail.com><8b711df40909280803i25cc920aq8a1819ffbaa5aaa6@mail.gmail.com><29ae894c0909281528u9cb9497h8fa7a2377468abff@mail.gmail.com><29ae894c0909281546j69186706t61399a5cd5d4c130@mail.gmail.com><8b711df40909281620r32163721j60ea5d75028d92de@mail.gmail.com><29ae894c0909281641n3479f380ge68b2077ab6b0665@mail.gmail.com>	<DD073C87D7A2E54CA9235E8382F20A5C0B44E8AB6F@ushqwmb01.palm1.palmone.com>	<6683EA6E3C0A4B0E8881E698DDA3AD3D@homeuser>
	<DD073C87D7A2E54CA9235E8382F20A5C0B44F8631B@ushqwmb01.palm1.palmone.com>
Message-ID: <4AC3E007.8000405@ntsg.umt.edu>

Could you post your modified fence_drac for iDRAC 6?

Thanks,
-A
--
Andrew A. Neuschwander, RHCE
Systems/Software Engineer
College of Forestry and Conservation
The University of Montana
http://www.ntsg.umt.edu
andrew at ntsg.umt.edu - 406.243.6310


Nitin Choudhary wrote:
> Hi!
> 
> With small modification to fence_drac script it is working now. 
> 
> Thanks,
> 
> Nitin 
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Louis
> Sent: Tuesday, September 29, 2009 6:16 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] Dell iDRAC 6 Support for fencing device
> 
> Hi,
> 
> I used ipmilan to bypass the iDREC6 fencing.
> 
> <fencedevice agent="fence_ipmilan" ipaddr="10.10.10.10" login="xxxxx" 
> name="xxxxx" passwd="yyyy">
> 
> 
> Regards
> Louis
> ----- Original Message ----- 
> From: "Nitin Choudhary" <Nitin.Choudhary at palm.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Tuesday, September 29, 2009 1:18 PM
> Subject: [Linux-cluster] Dell iDRAC 6 Support for fencing device
> 
> 
>> Hi!
>>
>> It seems that iDREC6 is not supported as fencing devices.
>>
>> Has anyone setup this before. Is there any workaround for this.
>>
>> Thanks,
>>
>> Nitin
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From steve.obrien at hdesd.org  Wed Sep 30 23:31:53 2009
From: steve.obrien at hdesd.org (Steve OBrien)
Date: Wed, 30 Sep 2009 23:31:53 +0000
Subject: [Linux-cluster] Cluster Pre-Production testing
Message-ID: <1254353513.24970.19.camel@localhost.localdomain>

Does anyone have a reliable method for testing a cluster before putting
it into production?

TIA,
Steve




From steve.obrien at hdesd.org  Wed Sep 30 23:55:05 2009
From: steve.obrien at hdesd.org (Steve OBrien)
Date: Wed, 30 Sep 2009 16:55:05 -0700
Subject: [Linux-cluster] TOTEM Failed to gather messages
Message-ID: <1254354905.24970.42.camel@localhost.localdomain>

I am getting these messages and was wondering if anyone could point me
in the right direction:

Sep 30 16:32:21 xen2 openais[4525]: [TOTEM] Retransmit List: 52  
Sep 30 16:32:21 xen2 openais[4525]: [TOTEM] FAILED TO RECEIVE 
Sep 30 16:32:21 xen2 openais[4525]: [TOTEM] entering GATHER state from
6. 
Sep 30 16:32:36 xen2 openais[4525]: [TOTEM] Retransmit List: 52  
Sep 30 16:32:36 xen2 openais[4525]: [TOTEM] FAILED TO RECEIVE 
Sep 30 16:32:36 xen2 openais[4525]: [TOTEM] entering GATHER state from
6. 
Sep 30 16:32:51 xen2 openais[4525]: [TOTEM] Retransmit List: 52  
Sep 30 16:32:51 xen2 openais[4525]: [TOTEM] FAILED TO RECEIVE 
Sep 30 16:32:51 xen2 openais[4525]: [TOTEM] entering GATHER state from
6. 


The node then gets fenced.


TIA,
Steve