From kjalleda at gmail.com  Mon May  1 03:07:29 2006
From: kjalleda at gmail.com (Kishore Jalleda)
Date: Sun, 30 Apr 2006 23:07:29 -0400
Subject: [Linux-cluster] MySQL on GFS benchmarks
In-Reply-To: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl>
References: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl>
Message-ID: <78aaf6710604302007s434bc328uf0fe3530e92877a2@mail.gmail.com>

No matter what you do, a standalone server would be faster than a clustered
architecture, if it is GFS over SAN, or even the MySQL cluster, due to
obvious reasons of latency invloved. Anyway what exactly are you trying to
build with MySQL, I mean what kind of performance you want from MySQL, may
be you could try Replication or if you want good scalability, then you would
be better off with the MySQL cluster.

Kishore Jalleda
http://kjalleda.googlepages.com/projects



On 4/26/06, Sander van Beek - Elexis <sander at elexis.nl> wrote:
> Hi all,
>
> We did a quick benchmark on our 2 node rhel4 testcluster with gfs and
> a gnbd storage server. The results were very sad. One of the nodes
> (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when
> running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node
> GFS over GNBD setup and inserts on both nodes at the same time, we
> only could do 80 inserts per second. I'm very interested in the
> perfomance others got in a similar setup. Would the performance
> increase when we use software based iscsi instead of gnbd?
> Or should we simply buy SAN equipment? Does anyone have statistics to
> compare a standalone mysql setup to a small gfs cluster using a san?
>
>
> With best regards,
> Sander van Beek
>
> ---------------------------------------
>
> Ing. S. van Beek
> Elexis
> Marketing 9
> 6921 RE Duiven
> The Netherlands
>
> Tel:    +31 (0)26 7110329
> Mob:    +31 (0)6 28395109
> Fax:    +31 (0)318 611112
> Email: sander at elexis.nl
> Web:    http://www.elexis.nl
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060430/549facf3/attachment.htm>

From sander at elexis.nl  Mon May  1 13:02:03 2006
From: sander at elexis.nl (Sander van Beek - Elexis)
Date: Mon, 01 May 2006 15:02:03 +0200
Subject: [Linux-cluster] MySQL on GFS benchmarks
In-Reply-To: <a6d13c780604301500h68be8c1dpc07fce59876770d0@mail.gmail.co m>
References: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl>
	<a6d13c780604301500h68be8c1dpc07fce59876770d0@mail.gmail.com>
Message-ID: <7.0.1.0.0.20060501150057.0232a040@elexis.nl>

Hi,

Im trying (software based) iSCSI right now, could post my benchmarks 
if other people are interested.

Best regards,
Sander

At 00:00 1-5-2006, you wrote:
>Sander,
>
>It depends, if you are looking for performance, definately SAN.
>iSCSI  might have a better performance over GNBD.
>I found this on google 
><http://www.bwbug.org/docs/RedHat-GNBD-Ethernet-SAN.pdf>http://www.bwbug.org/docs/RedHat-GNBD-Ethernet-SAN.pdf 
>
>It has some detais about GFS on SAN and on GNBD, It might help though.3
>Good Luck and keep us posted.
>
>Att.
>FTM
>
>
>On 4/26/06, Sander van Beek - Elexis 
><<mailto:sander at elexis.nl>sander at elexis.nl> wrote:
>Hi all,
>
>We did a quick benchmark on our 2 node rhel4 testcluster with gfs and
>a gnbd storage server. The results were very sad. One of the nodes
>(p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when
>running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node
>GFS over GNBD setup and inserts on both nodes at the same time, we
>only could do 80 inserts per second. I'm very interested in the
>perfomance others got in a similar setup. Would the performance
>increase when we use software based iscsi instead of gnbd?
>Or should we simply buy SAN equipment? Does anyone have statistics to
>compare a standalone mysql setup to a small gfs cluster using a san?
>
>
>With best regards,
>Sander van Beek
>
>---------------------------------------
>
>Ing. S. van Beek
>Elexis
>Marketing 9
>6921 RE Duiven
>The Netherlands
>
>Tel:    +31 (0)26 7110329
>Mob:    +31 (0)6 28395109
>Fax:    +31 (0)318 611112
>Email: <mailto:sander at elexis.nl>sander at elexis.nl
>Web:    <http://www.elexis.nl>http://www.elexis.nl
>
>--
>Linux-cluster mailing list
><mailto:Linux-cluster at redhat.com>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.385 / Virus Database: 268.5.1/327 - Release Date: 28-4-2006

Met vriendelijke groet,
Sander van Beek

---------------------------------------

Ing. S. van Beek
Elexis
Marketing 9
6921 RE Duiven

Tel:    +31 (0)26 7110329
Mob:    +31 (0)6 28395109
Fax:    +31 (0)318 611112
Email: sander at elexis.nl
Web:    http://www.elexis.nl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060501/29cd24f3/attachment.htm>

From sander at elexis.nl  Mon May  1 13:08:34 2006
From: sander at elexis.nl (Sander van Beek - Elexis)
Date: Mon, 01 May 2006 15:08:34 +0200
Subject: [Linux-cluster] MySQL on GFS benchmarks
In-Reply-To: <78aaf6710604302007s434bc328uf0fe3530e92877a2@mail.gmail.co m>
References: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl>
	<78aaf6710604302007s434bc328uf0fe3530e92877a2@mail.gmail.com>
Message-ID: <7.0.1.0.0.20060501150206.022e2600@elexis.nl>

Hi,

Ofcourse I understand that the performance will be less because of 
extra overhead. My goal is to be as close to standalone server 
performance as possible, with a certain budget in mind. One of the 
demands I have is that the cluster solution I'm building is 
transparent to clients, highly available, load balanced, and has to 
be scalable up to 2-8 servers. Both replication and mysql-cluster are 
not fully transparent, mysql on gfs can be I Think.
I'll keep the list updated when I get more benchmarks.

Best regards,
Sander

At 05:07 1-5-2006, you wrote:
>No matter what you do, a standalone server would be faster than a 
>clustered architecture, if it is GFS over SAN, or even the MySQL 
>cluster, due to obvious reasons of latency invloved. Anyway what 
>exactly are you trying to build with MySQL, I mean what kind of 
>performance you want from MySQL, may be you could try Replication or 
>if you want good scalability, then you would be better off with the 
>MySQL cluster.
>
>Kishore Jalleda
><http://kjalleda.googlepages.com/projects>http://kjalleda.googlepages.com/projects
>
>
>
>On 4/26/06, Sander van Beek - Elexis 
><<mailto:sander at elexis.nl>sander at elexis.nl > wrote:
> > Hi all,
> >
> > We did a quick benchmark on our 2 node rhel4 testcluster with gfs and
> > a gnbd storage server. The results were very sad. One of the nodes
> > (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when
> > running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node
> > GFS over GNBD setup and inserts on both nodes at the same time, we
> > only could do 80 inserts per second. I'm very interested in the
> > perfomance others got in a similar setup. Would the performance
> > increase when we use software based iscsi instead of gnbd?
> > Or should we simply buy SAN equipment? Does anyone have statistics to
> > compare a standalone mysql setup to a small gfs cluster using a san?
> >
> >
> > With best regards,
> > Sander van Beek
> >
> > ---------------------------------------
> >
> > Ing. S. van Beek
> > Elexis
> > Marketing 9
> > 6921 RE Duiven
> > The Netherlands
> >
> > Tel:    +31 (0)26 7110329
> > Mob:    +31 (0)6 28395109
> > Fax:    +31 (0)318 611112
> > Email: <mailto:sander at elexis.nl>sander at elexis.nl
> > Web:    <http://www.elexis.nl>http://www.elexis.nl
> >
> > --
> > Linux-cluster mailing list
> > <mailto:Linux-cluster at redhat.com>Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.385 / Virus Database: 268.5.1/327 - Release Date: 28-4-2006

Met vriendelijke groet,
Sander van Beek

---------------------------------------

Ing. S. van Beek
Elexis
Marketing 9
6921 RE Duiven

Tel:    +31 (0)26 7110329
Mob:    +31 (0)6 28395109
Fax:    +31 (0)318 611112
Email: sander at elexis.nl
Web:    http://www.elexis.nl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060501/edcfa6ed/attachment.htm>

From jparsons at redhat.com  Mon May  1 13:32:48 2006
From: jparsons at redhat.com (James Parsons)
Date: Mon, 01 May 2006 09:32:48 -0400
Subject: [Linux-cluster] 2nd try: fencing?
In-Reply-To: <20060430005245.GA76504@monsterjam.org>
References: <20060429015851.GB66106@monsterjam.org>	<1146278162.5933.12.camel@mechanism.localnet>
	<20060430005245.GA76504@monsterjam.org>
Message-ID: <44560E00.20806@redhat.com>

Jason wrote:

>>What you use for a fence method all depends on your hardware.  If you
>>give a quick explanation of your hardware setup, we might be able to
>>help you pick a fence device that will work with what you have already.
>>Or if you don't have anything that could be used to block access, you
>>might have to buy some network power switches. 
>>    
>>
>
>right now, all I have is 2 dell servers in a rack with identical configs. (dual ethernet 
>controllers and 1 separate controller for the heartbeat). 
>
Do your Dell servers have Drac support? RHCS supports Drac 4/I and 
DracIII/MC.

>Both are running 
>linux-ha and are both connected to a dell powervault 220S storage array which is configured so 
>that both hosts can access the drives concurrently (cluster mode). Im following the instructions 
>at 
>http://www.gyrate.org/archives/9  and am at step 17.. which says to configure CCS.
>
>I guess we could get an APC power switch, but what would you folks suggest? i.e. what model for 
>just a 2 cluster node (each server has 2 power supplies). Or is there a better way?
>
An AP7900 would probably work for you. If you use system-config-cluster 
to configure your cluster, it will detect that you are fencing each node 
twice with a 'power switch' type of fence on the same level and set the 
appropriate attributes for you in the cluster.conf file.

-J




From 14117614 at sun.ac.za  Mon May  1 14:01:18 2006
From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>)
Date: Mon, 1 May 2006 16:01:18 +0200
Subject: [Linux-cluster] < fecing with out any hardware? >
References: <20060429015851.GB66106@monsterjam.org>	<1146278162.5933.12.camel@mechanism.localnet>
	<20060430005245.GA76504@monsterjam.org> <44560E00.20806@redhat.com>
Message-ID: <2C04D2F14FD8254386851063BC2B67065E08B2@STBEVS01.stb.sun.ac.za>

Hi...

I was wondering if it would be a good idea to fence without any type of hardware, besides the pc's.

At the moment I have about 6 machines that I want to have a gfs on these 6 machines but due to budget constraints I cant afford to by hardware. How is it possible to fence without the use of hardware, besides manual fencing.

These machines are your basic desktop pc's, each has a 80gig HD and a 3Ghz P4 processor. They are all connected by a 1 Gigabit switch.

Would if be possible to use GFS or is there any other variant. They all run FC5( Fedora Core 5 ). I need some sort of GFS, because we intend on setting up a mysql clustering system as well.

Any ideas would be greatly appreciated.



He who has a why to live can bear with almost any how.
Friedrich Nietzsche



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of James Parsons
Sent: Mon 2006/05/01 03:32 PM
To: linux clustering
Subject: Re: [Linux-cluster] 2nd try: fencing?
 
Jason wrote:

>>What you use for a fence method all depends on your hardware.  If you
>>give a quick explanation of your hardware setup, we might be able to
>>help you pick a fence device that will work with what you have already.
>>Or if you don't have anything that could be used to block access, you
>>might have to buy some network power switches. 
>>    
>>
>
>right now, all I have is 2 dell servers in a rack with identical configs. (dual ethernet 
>controllers and 1 separate controller for the heartbeat). 
>
Do your Dell servers have Drac support? RHCS supports Drac 4/I and 
DracIII/MC.

>Both are running 
>linux-ha and are both connected to a dell powervault 220S storage array which is configured so 
>that both hosts can access the drives concurrently (cluster mode). Im following the instructions 
>at 
>http://www.gyrate.org/archives/9  and am at step 17.. which says to configure CCS.
>
>I guess we could get an APC power switch, but what would you folks suggest? i.e. what model for 
>just a 2 cluster node (each server has 2 power supplies). Or is there a better way?
>
An AP7900 would probably work for you. If you use system-config-cluster 
to configure your cluster, it will detect that you are fencing each node 
twice with a 'power switch' type of fence on the same level and set the 
appropriate attributes for you in the cluster.conf file.

-J


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4078 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060501/5d8191ef/attachment.bin>

From ehimmel at burlingtontelecom.com  Mon May  1 14:18:14 2006
From: ehimmel at burlingtontelecom.com (Evan Himmel)
Date: Mon, 01 May 2006 14:18:14 -0000
Subject: [Linux-cluster] Cluster Suite
Message-ID: <43E0D124.4090004@burlingtontelecom.com>

I am installing Cluster Suite and GFS to RHEL Update 3.  What I noticed 
is that the modules are installing for kernel 2.6.9-22.ELsmp not the 
current kernel 2.6.9-34.ELsmp. Is there something I am missing?

-- 
Evan

__________________________________________________________________________________________________________________________________________________
Attention! This electronic message contains information that may be legally confidential and/or privileged. The information is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it.




From jparsons at redhat.com  Mon May  1 14:36:11 2006
From: jparsons at redhat.com (James Parsons)
Date: Mon, 01 May 2006 10:36:11 -0400
Subject: [Linux-cluster] < fecing with out any hardware? >
In-Reply-To: <2C04D2F14FD8254386851063BC2B67065E08B2@STBEVS01.stb.sun.ac.za>
References: <20060429015851.GB66106@monsterjam.org>	<1146278162.5933.12.camel@mechanism.localnet>	<20060430005245.GA76504@monsterjam.org>
	<44560E00.20806@redhat.com>
	<2C04D2F14FD8254386851063BC2B67065E08B2@STBEVS01.stb.sun.ac.za>
Message-ID: <44561CDB.4010506@redhat.com>

Pool Lee, Mr <14117614 at sun.ac.za> wrote:

>Hi...
>
>I was wondering if it would be a good idea to fence without any type of hardware, besides the pc's.
>
>At the moment I have about 6 machines that I want to have a gfs on these 6 machines but due to budget constraints I cant afford to by hardware. How is it possible to fence without the use of hardware, besides manual fencing.
>
>These machines are your basic desktop pc's, each has a 80gig HD and a 3Ghz P4 processor. They are all connected by a 1 Gigabit switch.
>
>Would if be possible to use GFS or is there any other variant. They all run FC5( Fedora Core 5 ). I need some sort of GFS, because we intend on setting up a mysql clustering system as well.
>
>Any ideas would be greatly appreciated.
>
If you plan on doing anything with your cluster other than just 
tinkering...that is, if you intend to do real work with it, then you 
just need fencing. It is a requirement for a sound cluster/GFS environment.

Here is a WTI unit
http://cgi.ebay.com/WTI-NPS-115-Telnet-Dial-Up-Network-Power-Switch_W0QQitemZ9717474832QQcategoryZ86723QQssPageNameZWDVWQQrdZ1QQcmdZViewItem

You can also find used APC switches there as well.

You just need fencing.

-J




From Bowie_Bailey at BUC.com  Mon May  1 15:42:39 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Mon, 1 May 2006 11:42:39 -0400 
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
Message-ID: <4766EEE585A6D311ADF500E018C154E30213398C@bnifex.cis.buc.com>

Pool Lee, Mr <14117614 at sun.ac.za> wrote:
> I was wondering if it would be a good idea to fence without any type
> of hardware, besides the pc's. 
> 
> At the moment I have about 6 machines that I want to have a gfs on
> these 6 machines but due to budget constraints I cant afford to by
> hardware. How is it possible to fence without the use of hardware,
> besides manual fencing.   
> 
> These machines are your basic desktop pc's, each has a 80gig HD and a
> 3Ghz P4 processor. They are all connected by a 1 Gigabit switch. 
> 
> Would if be possible to use GFS or is there any other variant. They
> all run FC5( Fedora Core 5 ). I need some sort of GFS, because we
> intend on setting up a mysql clustering system as well.  
> 
> Any ideas would be greatly appreciated.

You need to have some sort of fencing.  You can use manual fencing,
but it doesn't work well with production systems.  What happens is
that any problem in the cluster causes everything to come to a dead
stop and wait for you to fix the problem and then let the cluster know
it's ok to continue operation.

The only real option for a production system is some sort of hardware
or software that allows for the cluster to fence misbehaving nodes on
it's own.  The cheapest is a power switch.

-- 
Bowie



From cfeist at redhat.com  Mon May  1 16:14:22 2006
From: cfeist at redhat.com (Chris Feist)
Date: Mon, 01 May 2006 11:14:22 -0500
Subject: [Linux-cluster] Cluster Suite
In-Reply-To: <43E0D124.4090004@burlingtontelecom.com>
References: <43E0D124.4090004@burlingtontelecom.com>
Message-ID: <445633DE.3010808@redhat.com>

Where are you getting the Cluster Suite & GFS Rpms for?  The latest versions 
are built against the latest (2.6.9-34) kernel.  You should be able to find 
them on RHN.

Thanks!
Chris

Evan Himmel wrote:
> I am installing Cluster Suite and GFS to RHEL Update 3.  What I noticed 
> is that the modules are installing for kernel 2.6.9-22.ELsmp not the 
> current kernel 2.6.9-34.ELsmp. Is there something I am missing?
> 



From mwill at penguincomputing.com  Mon May  1 16:55:11 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Mon, 1 May 2006 09:55:11 -0700
Subject: [Linux-cluster] MySQL on GFS benchmarks
Message-ID: <433093DF7AD7444DA65EFAFE3987879C107DE3@jellyfish.highlyscyld.com>

I would be interested in any numbers you could provide, but make sure to
also state exactly what
the underlying hardware is, i.e. node model, cpu speed, ram speed and
size, ethernet switch, disk model etc.
 
Michael

________________________________

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Sander van Beek -
Elexis
Sent: Monday, May 01, 2006 6:09 AM
To: linux clustering
Subject: Re: [Linux-cluster] MySQL on GFS benchmarks


Hi,

Ofcourse I understand that the performance will be less because of extra
overhead. My goal is to be as close to standalone server performance as
possible, with a certain budget in mind. One of the demands I have is
that the cluster solution I'm building is transparent to clients, highly
available, load balanced, and has to be scalable up to 2-8 servers. Both
replication and mysql-cluster are not fully transparent, mysql on gfs
can be I Think. 
I'll keep the list updated when I get more benchmarks.

Best regards,
Sander

At 05:07 1-5-2006, you wrote:


	No matter what you do, a standalone server would be faster than
a clustered architecture, if it is GFS over SAN, or even the MySQL
cluster, due to obvious reasons of latency invloved. Anyway what exactly
are you trying to build with MySQL, I mean what kind of performance you
want from MySQL, may be you could try Replication or if you want good
scalability, then you would be better off with the MySQL cluster. 
	
	Kishore Jalleda 
	http://kjalleda.googlepages.com/projects
	
	
	
	On 4/26/06, Sander van Beek - Elexis <sander at elexis.nl > wrote:
	> Hi all,
	> 
	> We did a quick benchmark on our 2 node rhel4 testcluster with
gfs and
	> a gnbd storage server. The results were very sad. One of the
nodes
	> (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second
when 
	> running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2
node
	> GFS over GNBD setup and inserts on both nodes at the same
time, we
	> only could do 80 inserts per second. I'm very interested in
the 
	> perfomance others got in a similar setup. Would the
performance
	> increase when we use software based iscsi instead of gnbd?
	> Or should we simply buy SAN equipment? Does anyone have
statistics to
	> compare a standalone mysql setup to a small gfs cluster using
a san?
	> 
	> 
	> With best regards,
	> Sander van Beek
	> 
	> ---------------------------------------
	> 
	> Ing. S. van Beek 
	> Elexis
	> Marketing 9
	> 6921 RE Duiven
	> The Netherlands
	> 
	> Tel:    +31 (0)26 7110329
	> Mob:    +31 (0)6 28395109
	> Fax:    +31 (0)318 611112
	> Email: sander at elexis.nl
	> Web:    http://www.elexis.nl
	> 
	> --
	> Linux-cluster mailing list
	> Linux-cluster at redhat.com 
	> https://www.redhat.com/mailman/listinfo/linux-cluster
	> 
	
	--
	Linux-cluster mailing list
	Linux-cluster at redhat.com
	https://www.redhat.com/mailman/listinfo/linux-cluster
	No virus found in this incoming message.
	Checked by AVG Free Edition.
	Version: 7.1.385 / Virus Database: 268.5.1/327 - Release Date:
28-4-2006

Met vriendelijke groet,
Sander van Beek

---------------------------------------

Ing. S. van Beek
Elexis
Marketing 9 
6921 RE Duiven

Tel:    +31 (0)26 7110329
Mob:    +31 (0)6 28395109
Fax:    +31 (0)318 611112
Email: sander at elexis.nl
Web:     http://www.elexis.nl
<http://www.elexis.nl/> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060501/24ab25c3/attachment.htm>

From omer at faruk.net  Mon May  1 19:11:18 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Mon, 1 May 2006 22:11:18 +0300 (EEST)
Subject: [Linux-cluster] HP msa20 and rhcs?
Message-ID: <1333.85.101.156.147.1146510678.squirrel@85.101.156.147>


Hi,

I need a cheap shared storage and wanted to know if anyone in this list
used HP MSA20  (shared SCSI) on rhcs? I want to setup a 2 node cluster
with shared storage and cheapest HP shared storage seems to be MSA20..

Best Reagrads,

-- 
Omer Faruk Sen
http://www.faruk.net



From 14117614 at sun.ac.za  Mon May  1 19:45:55 2006
From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>)
Date: Mon, 1 May 2006 21:45:55 +0200
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
References: <4766EEE585A6D311ADF500E018C154E30213398C@bnifex.cis.buc.com>
Message-ID: <2C04D2F14FD8254386851063BC2B67065E08B4@STBEVS01.stb.sun.ac.za>


Hi..

What about software fencing? Is it really nesasary to be hardware!

Is there a difference between lutre/cfs, the product that sun uses, and gfs?

I'm planning to do mostly numerical work with the cluster and thus I would like all the machines to be able to 
retrieve data, as if it was local on the machine. NFS is very limited in this regard because we intend on using vast arrays of matrices, that can be up to 1-2 Gig.

I was hoping to implement GFS since all the machines are already setup, without the hardware fencing though.

Kind Regards
Lee



He who has a why to live can bear with almost any how.
Friedrich Nietzsche



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Bowie Bailey
Sent: Mon 2006/05/01 05:42 PM
To: linux clustering
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
 
Pool Lee, Mr <14117614 at sun.ac.za> wrote:
> I was wondering if it would be a good idea to fence without any type
> of hardware, besides the pc's. 
> 
> At the moment I have about 6 machines that I want to have a gfs on
> these 6 machines but due to budget constraints I cant afford to by
> hardware. How is it possible to fence without the use of hardware,
> besides manual fencing.   
> 
> These machines are your basic desktop pc's, each has a 80gig HD and a
> 3Ghz P4 processor. They are all connected by a 1 Gigabit switch. 
> 
> Would if be possible to use GFS or is there any other variant. They
> all run FC5( Fedora Core 5 ). I need some sort of GFS, because we
> intend on setting up a mysql clustering system as well.  
> 
> Any ideas would be greatly appreciated.

You need to have some sort of fencing.  You can use manual fencing,
but it doesn't work well with production systems.  What happens is
that any problem in the cluster causes everything to come to a dead
stop and wait for you to fix the problem and then let the cluster know
it's ok to continue operation.

The only real option for a production system is some sort of hardware
or software that allows for the cluster to fence misbehaving nodes on
it's own.  The cheapest is a power switch.

-- 
Bowie

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3942 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060501/a1b1dcc5/attachment.bin>

From Bowie_Bailey at BUC.com  Mon May  1 20:15:43 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Mon, 1 May 2006 16:15:43 -0400 
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
Message-ID: <4766EEE585A6D311ADF500E018C154E30268496F@bnifex.cis.buc.com>

Pool Lee, Mr <14117614 at sun.ac.za> wrote:
> 
> What about software fencing? Is it really nesasary to be hardware!
> 
> Is there a difference between lutre/cfs, the product that sun uses,
> and gfs? 
> 
> I'm planning to do mostly numerical work with the cluster and thus I
> would like all the machines to be able to retrieve data, as if it
> was local on the machine. NFS is very limited in this regard because
> we intend on using vast arrays of matrices, that can be up to 1-2
> Gig.  
> 
> I was hoping to implement GFS since all the machines are already
> setup, without the hardware fencing though. 

The thing with fencing is that you have to choose a method which is
supported by your configuration.  These are the basic ways to fence a
cluster:

Manual fencing - nothing special needed, but it doesn't work well in a
production environment.

Power fencing - Forcibly reboots a misbehaving node.  Requires a
compatible power switch.

Network fencing - Blocks the misbehaving node's access to the cluster
resources.  Requires a compatible switch (usually used with fiber
switches).

Software fencing - Notifies storage management software to block
access to the misbehaving node.  Requires compatible storage
configuration.  I believe this is only supported with GNBD storage
servers.

Your choices are limited by your configuration.  The only options that
can be used with any configuration are manual and power.

I don't know about the differences between the RedHat Clustering and
lutre/cfs.  I DO know that any type of clustering will require fencing
of some sort.

-- 
Bowie



From fgp at phlo.org  Mon May  1 20:30:45 2006
From: fgp at phlo.org (Florian G. Pflug)
Date: Mon, 01 May 2006 22:30:45 +0200
Subject: [Linux-cluster] iSCSI fence agent
In-Reply-To: <1146147436.12841.13.camel@merlin.Mines.EDU>
References: <44507BE8.20402@adelpha-lan.org>	<1146144991.2984.127.camel@ayanami.boston.redhat.com>	<4450CA79.9020400@adelpha-lan.org>
	<1146147436.12841.13.camel@merlin.Mines.EDU>
Message-ID: <44566FF5.1000602@phlo.org>

Matthew B. Brookover wrote:
> I have not used this tool in a while, but it did work on my system.
> 
> I would not trust this version to fence properly.  Using system does not
> allow the exit status of iptables to be checked for errors.  System only
> reports the status of the ssh command, not the command that is called on
> the remote host.
I believe ssh 'forwards' that exit-code of the remote-command - at
least the version that comes with debian/sarge does.

 > ssh fgp at dev '/bin/false'; echo $?
gives:
1

 > ssh fgp at dev '/bin/true'; echo $?
gives:
0

At least on my machine..

greetings, Florian Pflug



From mbrookov at mines.edu  Mon May  1 20:45:30 2006
From: mbrookov at mines.edu (Matthew B. Brookover)
Date: Mon, 01 May 2006 14:45:30 -0600
Subject: [Linux-cluster] iSCSI fence agent
In-Reply-To: <44566FF5.1000602@phlo.org>
References: <44507BE8.20402@adelpha-lan.org>
	<1146144991.2984.127.camel@ayanami.boston.redhat.com>
	<4450CA79.9020400@adelpha-lan.org>
	<1146147436.12841.13.camel@merlin.Mines.EDU>
	<44566FF5.1000602@phlo.org>
Message-ID: <1146516331.16843.9.camel@merlin.Mines.EDU>

Hmm, nice to know that.  I must have been thinking of rsh or something
else.  I would encourage you to test carefully.

Matt


On Mon, 2006-05-01 at 22:30 +0200, Florian G. Pflug wrote:

> Matthew B. Brookover wrote:
> > I have not used this tool in a while, but it did work on my system.
> > 
> > I would not trust this version to fence properly.  Using system does not
> > allow the exit status of iptables to be checked for errors.  System only
> > reports the status of the ssh command, not the command that is called on
> > the remote host.
> I believe ssh 'forwards' that exit-code of the remote-command - at
> least the version that comes with debian/sarge does.
> 
>  > ssh fgp at dev '/bin/false'; echo $?
> gives:
> 1
> 
>  > ssh fgp at dev '/bin/true'; echo $?
> gives:
> 0
> 
> At least on my machine..
> 
> greetings, Florian Pflug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060501/981b97f7/attachment.htm>

From placid at adelpha-lan.org  Mon May  1 20:47:52 2006
From: placid at adelpha-lan.org (Castang Jerome)
Date: Mon, 01 May 2006 22:47:52 +0200
Subject: [Linux-cluster] iSCSI fence agent
In-Reply-To: <1146516331.16843.9.camel@merlin.Mines.EDU>
References: <44507BE8.20402@adelpha-lan.org>	
	<1146144991.2984.127.camel@ayanami.boston.redhat.com>	
	<4450CA79.9020400@adelpha-lan.org>	
	<1146147436.12841.13.camel@merlin.Mines.EDU>
	<44566FF5.1000602@phlo.org>
	<1146516331.16843.9.camel@merlin.Mines.EDU>
Message-ID: <445673F8.1060507@adelpha-lan.org>

Matthew B. Brookover a ?crit :
> Hmm, nice to know that.  I must have been thinking of rsh or something 
> else.  I would encourage you to test carefully.
>
> Matt
It seems to work..


-- 
Jerome CASTANG
Tel: 06.85.74.33.02
mail: jerome.castang at adelpha-lan.org

---------------------------------------------
RTFM !



From vcmarti at sph.emory.edu  Mon May  1 21:03:49 2006
From: vcmarti at sph.emory.edu (Vernard Martin)
Date: Mon, 01 May 2006 17:03:49 -0400
Subject: [Linux-cluster] Cluster Suite
In-Reply-To: <445633DE.3010808@redhat.com>
References: <43E0D124.4090004@burlingtontelecom.com> 
	<445633DE.3010808@redhat.com>
Message-ID: <445677B5.5070708@sph.emory.edu>

Chris Feist wrote:
> Where are you getting the Cluster Suite & GFS Rpms for?  The latest 
> versions are built against the latest (2.6.9-34) kernel.  You should 
> be able to find them on RHN.
I was trying to install RHCS & GFS as well but could only find the RHEL3 
version on RHN. Am I just looking in the wrong spot?

I found the SRPMs on the redhat site at 
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/4/en/RHCS/x86_64/SRPMS 
which apparently was last built again the
2.6.9-11 kernle as that is what it was looking for in the .spec files 
for cman-kernel and dlm-kernel.

am I looking in the wrong spot? If so, where is the correct spot?



-- 
Vernard Martin (vcmarti at sph.emory.edu) Applications Developer/Analyst
Information Services -- School of Public Health -- Emory University 



From cfeist at redhat.com  Mon May  1 22:12:51 2006
From: cfeist at redhat.com (Chris Feist)
Date: Mon, 01 May 2006 17:12:51 -0500
Subject: [Linux-cluster] Cluster Suite
In-Reply-To: <445677B5.5070708@sph.emory.edu>
References: <43E0D124.4090004@burlingtontelecom.com>
	<445633DE.3010808@redhat.com> <445677B5.5070708@sph.emory.edu>
Message-ID: <445687E3.7030906@redhat.com>

If you pay for Cluster Suite Entitlements for RHEL4, there should be a Cluster 
Suite channel under the main RHEL4 channel.

Otherwise you can download the upgraded SRPMS here:

ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4AS/en/RHGFS/SRPMS/

Thanks,
Chris

Vernard Martin wrote:
> Chris Feist wrote:
>> Where are you getting the Cluster Suite & GFS Rpms for?  The latest 
>> versions are built against the latest (2.6.9-34) kernel.  You should 
>> be able to find them on RHN.
> I was trying to install RHCS & GFS as well but could only find the RHEL3 
> version on RHN. Am I just looking in the wrong spot?
> 
> I found the SRPMs on the redhat site at 
> ftp://ftp.redhat.com/pub/redhat/linux/enterprise/4/en/RHCS/x86_64/SRPMS 
> which apparently was last built again the
> 2.6.9-11 kernle as that is what it was looking for in the .spec files 
> for cman-kernel and dlm-kernel.
> 
> am I looking in the wrong spot? If so, where is the correct spot?
> 
> 
> 



From prolay123 at yahoo.com  Tue May  2 08:29:17 2006
From: prolay123 at yahoo.com (prolay chatterjee)
Date: Tue, 2 May 2006 01:29:17 -0700 (PDT)
Subject: [Linux-cluster] Slow data writing rate
Message-ID: <20060502082917.3246.qmail@web60711.mail.yahoo.com>

Hi,
  I am administrating one site having IBM X255 server with IBM FASt T500 storage with fiber optic link using Qlogic 2300 controler.The site was set up few years back with RHEL AS 2.1.Now it is observed that when cluster service is up writing data in a cluster partition is very slow (in KBs) as it was expected to be at least in terms of 100MBs.The Qlogic speed is 2 GBPS.It also found that whenever cluster service is st oped and cluster partition mounted manually with general mount command data writing speed is nearly 1GBPS.Please suggest the solution of this problem.
  Regards,
  Prolay Chatterjee



		
---------------------------------
Love cheap thrills? Enjoy PC-to-Phone  calls to 30+ countries for just 2?/min with Yahoo! Messenger with Voice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060502/801f5a27/attachment.htm>

From m.vit at exadron.com  Tue May  2 11:50:37 2006
From: m.vit at exadron.com (Vit Matteo)
Date: Tue, 02 May 2006 13:50:37 +0200
Subject: [Linux-cluster] problem with fencing
Message-ID: <4457478D.1050007@exadron.com>

Hi,
I've a system with 4 nodes and I can sucessfully mount a gfs partition. 
But if I shutdown one node, I can't access to the gfs partition (If I 
try to write something, the program that access to the gfs partition hangs).
I find in the logs

Apr 28 13:37:40 c0-21 fenced[4863]: fencing node "c0-28"
Apr 28 13:37:40 c0-21 fenced[4863]: fence "c0-28" failed

where c0-28 is the node powered off. If I power on c0-28, then I can 
access the gfs partition.

Is it correct ? Someone with the same problem ?

I use cluster-1.01.00 built from source with a 2.6.14 kernel. Every node 
has one vote. The fence agent is fence_manual.

Matteo Vit



From vlaurenz at advance.net  Tue May  2 23:35:14 2006
From: vlaurenz at advance.net (Vito Laurenza)
Date: Tue, 02 May 2006 19:35:14 -0400
Subject: [Linux-cluster] RHEL4 and Cluster Suite
Message-ID: <4457ECB2.7020705@advance.net>

Hello all,
I'm new to Cluster Suite and I was wondering if there was a tutorial of 
some kind regarding the cluster.conf file.  I've read the Red Hat docs 
and they suggest using the GUI to configure, but I'm running strictly 
command line here and need to know how to properly write the XML.  I've 
only come across a couple of samples and was hoping someone could give 
give me (or point me to) a complete run down of valid tags and 
attributes.  Any help would be appreciated.

:::: Vito Laurenza



From gforte at leopard.us.udel.edu  Tue May  2 23:46:43 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Tue, 02 May 2006 19:46:43 -0400
Subject: [Linux-cluster] RHEL4 and Cluster Suite
In-Reply-To: <4457ECB2.7020705@advance.net>
References: <4457ECB2.7020705@advance.net>
Message-ID: <4457EF63.4050601@leopard.us.udel.edu>

Agreed!  I asked about this months ago, don't think I ever got a 
straight answer.  'course I suppose technically we could go wade through 
the code that reads the file to figure it out ourselves ... I'd rather 
see a document, though.  Maybe if I get unlazy I'll go do just that and 
write one.  Unless someone's got one handy ...

Vito, I wouldn't personally recommend the gui, anyway; I don't find it 
to be very robust, and you'll be better off learning to do it by hand in 
the long run.

-g

p.s. just to pick a nit, you can be "strictly command line" on a box and 
still run the gui tools remotely from another machine; you just need to 
have the X11, etc. packages installed but set the default run level to 3 
in /etc/inittab.

Vito Laurenza wrote:
> Hello all,
> I'm new to Cluster Suite and I was wondering if there was a tutorial of 
> some kind regarding the cluster.conf file.  I've read the Red Hat docs 
> and they suggest using the GUI to configure, but I'm running strictly 
> command line here and need to know how to properly write the XML.  I've 
> only come across a couple of samples and was hoping someone could give 
> give me (or point me to) a complete run down of valid tags and 
> attributes.  Any help would be appreciated.
> 
> :::: Vito Laurenza
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From vlaurenz at advance.net  Wed May  3 00:08:54 2006
From: vlaurenz at advance.net (Vito Laurenza)
Date: Tue, 02 May 2006 20:08:54 -0400
Subject: [Linux-cluster] RHEL4 and Cluster Suite
In-Reply-To: <4457EF63.4050601@leopard.us.udel.edu>
References: <4457ECB2.7020705@advance.net>
	<4457EF63.4050601@leopard.us.udel.edu>
Message-ID: <4457F496.80203@advance.net>

Greg,

I'm glad I'm not the only one who can't find the info.  What I meant by 
"strictly command line" is that I have no desire to use the GUI.  :)

Let me know if you find anything.  I'll keep you posted as well.

:::: Vito Laurenza

Greg Forte wrote:
> Agreed!  I asked about this months ago, don't think I ever got a 
> straight answer.  'course I suppose technically we could go wade through 
> the code that reads the file to figure it out ourselves ... I'd rather 
> see a document, though.  Maybe if I get unlazy I'll go do just that and 
> write one.  Unless someone's got one handy ...
> 
> Vito, I wouldn't personally recommend the gui, anyway; I don't find it 
> to be very robust, and you'll be better off learning to do it by hand in 
> the long run.
> 
> -g
> 
> p.s. just to pick a nit, you can be "strictly command line" on a box and 
> still run the gui tools remotely from another machine; you just need to 
> have the X11, etc. packages installed but set the default run level to 3 
> in /etc/inittab.
> 
> Vito Laurenza wrote:
>> Hello all,
>> I'm new to Cluster Suite and I was wondering if there was a tutorial 
>> of some kind regarding the cluster.conf file.  I've read the Red Hat 
>> docs and they suggest using the GUI to configure, but I'm running 
>> strictly command line here and need to know how to properly write the 
>> XML.  I've only come across a couple of samples and was hoping someone 
>> could give give me (or point me to) a complete run down of valid tags 
>> and attributes.  Any help would be appreciated.
>>
>> :::: Vito Laurenza
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From Alain.Moulle at bull.net  Wed May  3 06:37:03 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Wed, 03 May 2006 08:37:03 +0200
Subject: [Linux-cluster] CS4 Update 2 / fencing and dump
Message-ID: <44584F8F.3020408@bull.net>

Hi
I'm facing a big problem using the CS4 with fence_ipmilan :
when a node is "crashing", the other node is fencing it poweroff/poweron
whereas the node was entering a dump process ... the reason is that
with many systems, when a machine dumps, the state is still "RUNNING",
and there is never a state "DUMPING".
So with this fencing method, I would never have a dump to
analyse a problem.
Did someone face this problem and has an idea or a workaround ?

Thanks
Alain Moull?





From christoph.thommen at bl.ch  Wed May  3 06:54:46 2006
From: christoph.thommen at bl.ch (Thommen, Christoph FKD)
Date: Wed, 3 May 2006 08:54:46 +0200
Subject: [Linux-cluster] which fence device?
Message-ID: <553B0E9C0C87D24A876E6B14FFE373D6BFD753@faimbx01.bl.ch>

Hi,

I'm looking for a power fencing switch for my 3-4 cluster nodes... which
switch do you use, can someone recommend one to me?

 

Thanks for your response

 

 

 

Greets

 

Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060503/10dd186d/attachment.htm>

From carlopmart at gmail.com  Wed May  3 10:32:10 2006
From: carlopmart at gmail.com (carlopmart)
Date: Wed, 03 May 2006 12:32:10 +0200
Subject: [Linux-cluster] Recommended HP servers for cluster suite
In-Reply-To: <445123AE.4000204@redhat.com>
References: <4450F472.8050205@gmail.com>	<20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de>
	<445123AE.4000204@redhat.com>
Message-ID: <445886AA.5000608@gmail.com>

And what about Porliant DL 380??

Thanks

James Parsons wrote:
> rainer at ultra-secure.de wrote:
> 
>> Quoting carlopmart <carlopmart at gmail.com>:
>>
>>> Hi all,
>>>
>>>  Somebody can recommends me some HP servers to use with Redhat 
>>> Cluster Suite for RHEL 4?? My requeriments are:
>>>
>>>  - 4GB RAM
>>>  - Scsi disks
>>>  - Two CPUs
>>>  - iLO support for RHCS fence agent.
>>>
>>> I don't need shred storage.
>>
>>
>> Blades.
>> bl20p can be had very cheap nowadays, but should be enough for most 
>> tasks.
>> Downside: only two internal disks, the rest is via SAN (or iSCSI). 
> 
> I want to add a vote for the proliant bl* series. It uses iLO...not the 
> older Riloe cards, which have been problematic now and then.
> 
> -J
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From sanelson at gmail.com  Wed May  3 10:45:46 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Wed, 3 May 2006 11:45:46 +0100
Subject: [Linux-cluster] Recommended HP servers for cluster suite
In-Reply-To: <445886AA.5000608@gmail.com>
References: <4450F472.8050205@gmail.com>
	<20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de>
	<445123AE.4000204@redhat.com> <445886AA.5000608@gmail.com>
Message-ID: <b6131fdc0605030345r74c9bb3dx2eaa8b6b232add72@mail.gmail.com>

On 5/3/06, carlopmart <carlopmart at gmail.com> wrote:

> And what about Porliant DL 380??

I use dl380s for low-end, dl580s and now 585s for upper-end clusters. 
Very very happy with them.

S.



From Timothy.Lin at noaa.gov  Wed May  3 10:50:41 2006
From: Timothy.Lin at noaa.gov (Timothy Lin)
Date: Wed, 03 May 2006 06:50:41 -0400
Subject: [Linux-cluster] Recommended HP servers for cluster suite
In-Reply-To: <445886AA.5000608@gmail.com>
References: <4450F472.8050205@gmail.com>
	<20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de>
	<445123AE.4000204@redhat.com> <445886AA.5000608@gmail.com>
Message-ID: <44588B01.6000302@noaa.gov>

Blades are a lot more expensive than comparable 1U/2U boxes. Great if 
you are running out of space.
another good thing is, with proper SAN planning and boot-from san setup, 
you can do drop-in replacement when a blade fails.

we have good experience putting clustersuite on BL35p ( the half height 
blades ) , but GFS is another story.
(Might have something to do with MSA1500 we have, Redhat and HP are 
blaming each other on that issue)
 
iLO is pretty nice, but I think VMware consoles are easier to use :) 
Now if i can just figure out how to make GFS work properly in ESX server 
....

Tim.

>>>
>>>
>>> Blades.
>>> bl20p can be had very cheap nowadays, but should be enough for most 
>>> tasks.
>>> Downside: only two internal disks, the rest is via SAN (or iSCSI). 
>>
>>
>> I want to add a vote for the proliant bl* series. It uses iLO...not 
>> the older Riloe cards, which have been problematic now and then.
>>
>> -J
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>



From cosimo at streppone.it  Wed May  3 10:53:03 2006
From: cosimo at streppone.it (Cosimo Streppone)
Date: Wed, 03 May 2006 12:53:03 +0200
Subject: [Linux-cluster] Recommended HP servers for cluster suite
In-Reply-To: <b6131fdc0605030345r74c9bb3dx2eaa8b6b232add72@mail.gmail.com>
References: <4450F472.8050205@gmail.com>	<20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de>	<445123AE.4000204@redhat.com>
	<445886AA.5000608@gmail.com>
	<b6131fdc0605030345r74c9bb3dx2eaa8b6b232add72@mail.gmail.com>
Message-ID: <44588B8F.1090609@streppone.it>

Steve Nelson wrote:

> On 5/3/06, carlopmart <carlopmart at gmail.com> wrote:
> 
>> And what about Porliant DL 380??
> 
> I use dl380s for low-end, dl580s and now 585s for upper-end clusters. 
> Very very happy with them.

Do you use iLO port for fencing?
Please can you explain your iLO configuration?
I have some doubts on how to configure fencing.

Example:
you have nodes A,B and iLO devices Ai, Bi
Fencing device for node A should be Ai or Bi?

I had also troubles on installation of additional
perl modules required to make fence_ilo agent work:
despite having IO::Socket::SSL and Net::SSL::something
correctly installed, it keeps throwing error messages
and I can't seem to correctly startup fenced.

-- 
Cosimo



From kanderso at redhat.com  Wed May  3 15:56:58 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Wed, 03 May 2006 10:56:58 -0500
Subject: [Linux-cluster] Red Hat Summit - Cluster and Storage Talks
Message-ID: <1146671819.2876.56.camel@dhcp80-204.msp.redhat.com>

Hi all,

Caution - this is a shameless plug for some of the cluster developers.

First, I would like to apologize but we have been too focused on getting
the new cluster infrastructure integrated, and pushing GFS2/DLM
upstream, that we have not organized a cluster summit for this year.

However, some of the key architects of the cluster components are going
to be speaking at this years Red Hat Summit
( http://www.redhat.com/promo/summit/ ) at the end of the month of May
in Nashville. Dave Teigland, Steven Whitehouse, Steven Dake and Jim
Parsons are all on the schedule to speak.

        * Dave Teigland will be covering the evolution and exposure of
        the cluster components APIs including DLM, CMAN and CCS.  
        * Steve Whitehouse will describe the changes between GFS and
        GFS2, reasons behind the changes and share some details about
        the new layout.  
        * Steven Dake is going to cover the openais project, the
        integration of totem protocol into the core cluster
        infrastructure and cover the new high availability APIs that
        openais provides and some direction on where it is heading. 
        * Jim Parsons will describe the Conga project, which is going to
        provide the new management interfaces and infrastructure to make
        cluster and storage administration much simpler.
        
All of these presentations are currently scheduled for May 31, the first
day of the Red Hat Summit, and will include some Q&A time. We have not
set up any formal cluster group discussions, but if people were to have
an interest, I would imagine that there would be ample opportunities to
find a local establishment where all of these guys would be to have an
informal get together of cluster developers.  It is not often that all
of these guys are in the same country at the same time, so hopefully we
can take advantage of it.  

So, check out the web site for the Red Hat Summit, we are under the
Cluster and Storage track.  If you sign up (sorry, but there is a fee to
attend), either let me know or respond to the linux-cluster mailing
list.  If enough cluster developers are interested, we can be more
specific about where we will be hanging out, rather than just strolling
all over Nashville.

Thanks and hope to see you in Nashville.

Kevin Anderson
Director, Cluster and Storage Development
Red Hat
kanderso at redhat.com

Red Hat Summit
http://www.redhat.com/promo/summit/

Cluster Track 
http://www.redhat.com/promo/summit/tracks/#cluster



From m_list at eshine.de  Wed May  3 15:57:20 2006
From: m_list at eshine.de (Arnd)
Date: Wed, 03 May 2006 17:57:20 +0200
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
In-Reply-To: <4766EEE585A6D311ADF500E018C154E30268496F@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E30268496F@bnifex.cis.buc.com>
Message-ID: <4458D2E0.3080802@eshine.de>


Bowie Bailey wrote:
> Pool Lee, Mr <14117614 at sun.ac.za> wrote:
>> What about software fencing? Is it really nesasary to be hardware!
> 
> Your choices are limited by your configuration.  The only options that
> can be used with any configuration are manual and power.

I was testing a few possibilities of fencing. GFS expects from the 
fencing script the status "0" to decide if it was successfull. So you 
can specify any script by your own in the cluster.conf. (Please correct 
me, if I'm wrong)

This script can be an automatic login to the failed server (ssh, rlogin, 
serial console) which can execute any remote operation (for example 
unload the module of the SAN-device) or causing an kernel panic (which 
is the fencing-method in ocfs2 ;-) ).

Your fencing-script must assure that the failed host doesn't have access 
to the filesystem anymore!

Arnd



From Bowie_Bailey at BUC.com  Wed May  3 16:08:19 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 3 May 2006 12:08:19 -0400 
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
Message-ID: <4766EEE585A6D311ADF500E018C154E302684987@bnifex.cis.buc.com>

Arnd wrote:
> Bowie Bailey wrote:
> > Pool Lee, Mr <14117614 at sun.ac.za> wrote:
> > > What about software fencing? Is it really nesasary to be hardware!
> > 
> > Your choices are limited by your configuration.  The only options
> > that can be used with any configuration are manual and power.
> 
> I was testing a few possibilities of fencing. GFS expects from the
> fencing script the status "0" to decide if it was successfull. So you
> can specify any script by your own in the cluster.conf. (Please
> correct me, if I'm wrong)
> 
> This script can be an automatic login to the failed server (ssh,
> rlogin, serial console) which can execute any remote operation (for
> example unload the module of the SAN-device) or causing an kernel
> panic (which is the fencing-method in ocfs2 ;-) ).
> 
> Your fencing-script must assure that the failed host doesn't have
> access to the filesystem anymore!

I'm not an expert on the topic.  I just use the built-in stuff.  But
my understanding is that you can write your own fence script without
too much trouble.  You just have to be careful and make sure that it
is bulletproof.

If your script relies on an ssh connection to the failed server and
the failed server is not responding to ssh, then the fencing fails and
the entire cluster must stop and wait for manual intervention.

-- 
Bowie



From vlaurenz at advance.net  Wed May  3 21:16:39 2006
From: vlaurenz at advance.net (Vito Laurenza)
Date: Wed, 03 May 2006 17:16:39 -0400
Subject: [Linux-cluster] RHEL4 and Cluster Suite
In-Reply-To: <4457ECB2.7020705@advance.net>
References: <4457ECB2.7020705@advance.net>
Message-ID: <44591DB7.9070706@advance.net>

...Also...

How do I configure Cluster Suite to notify (via email) on Heartbeat 
events, fence events, etc?  Thanks!

:::: Vito Laurenza

Vito Laurenza wrote:
> Hello all,
> I'm new to Cluster Suite and I was wondering if there was a tutorial of 
> some kind regarding the cluster.conf file.  I've read the Red Hat docs 
> and they suggest using the GUI to configure, but I'm running strictly 
> command line here and need to know how to properly write the XML.  I've 
> only come across a couple of samples and was hoping someone could give 
> give me (or point me to) a complete run down of valid tags and 
> attributes.  Any help would be appreciated.
> 
> :::: Vito Laurenza
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From herta.vandeneynde at cc.kuleuven.be  Thu May  4 07:25:57 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Thu, 04 May 2006 09:25:57 +0200
Subject: [Linux-cluster] umount failed - device is busy
In-Reply-To: <434C0FA7.9000803@cc.kuleuven.be>
References: <434A7ADE.108@cc.kuleuven.be>	<434A8FE6.40508@cc.kuleuven.be>	<1128963722.4680.21.camel@ayanami.boston.redhat.com>	<434AC9DE.50606@cc.kuleuven.be>	<1128978146.4680.37.camel@ayanami.boston.redhat.com>	<434ADB8C.9010508@cc.kuleuven.be>	<1129043197.4680.85.camel@ayanami.boston.redhat.com>	<434BDECD.2060303@cc.kuleuven.be>	<1129054711.4680.119.camel@ayanami.boston.redhat.com>
	<434C0FA7.9000803@cc.kuleuven.be>
Message-ID: <4459AC85.7020308@cc.kuleuven.be>

Herta Van den Eynde wrote:
> Lon Hohberger wrote:
> 
>> On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote:
>>
>>
>>> Bit of extra information:  the system that was running the services 
>>> got STONITHed by the other cluster member shortly before midnight.
>>> The services all failed over nicely, but the situation remains:  if I 
>>> try to stop or relocate a service, I get a "device is busy".
>>> I suppose that rules out an intermittent issue.
>>>
>>> There's no mounts below mounts.
>>
>>
>>
>> Drat.
>>
>> Nfsd is the most likely candidate for holding the reference.
>> Unfortunately, this is not something I can track down; you will have to
>> either file a support request and/or a Bugzilla.  When you get a chance,
>> you should definitely try stopping nfsd and seeing if that clears the
>> mystery references (allowing you to unmount).  If the problem comes from
>> nfsd, it should not be terribly difficult to track down.
>>
>> Also, you should not need to recompile your kernel to probe all the LUNs
>> per device; just edit /etc/modules.conf:
>>
>> options scsi_mod max_scsi_luns=128
>>
>> ... then run mkinitrd to rebuild the initrd image.
>>
>> -- Lon
> 
> Next maintenance window is 4 weeks away, so I won't be able to test the 
> nfsd hypothesis anytime soon.  In the meantime, I'll file a support 
> request.  I'll keep you posted.
> 
> At least the unexpected STONITH confirms that the failover still works.
> 
> The /etc/modules.conf tip is a big time saver.  Rebuilding the modules 
> takes forever.
> 
> Thanks, Lon.
> 
> Herta

Apologies for not updating this sooner.  (Thanks for remindeing me, Owen.)

During a later maintenance window, I shut down the cluster services, but 
it wasn't until I stopped the nfsd, that the filesystems could actually 
be unmounted, which seems to confirm Lon's theory about nfsd being the 
likely candidate for holding the reference.

I found a note elsewhere on the web where someone worked around the 
problem by stopping nfsd, stopping the service, restarting nfsd, and 
relocating the service.  Disadvantage being that all nfs services 
experience a minor interrupt at the time.

Anyway, my problem disappeared during the latest maintenance window. 
Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL -> 
nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so 
I'm not 100% sure which of the two fixed it, and curious though I am, I 
simply don't have the time to start reading the code.  If anyone has 
further insights, I'd love to read about it, though.

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From mark at wormgoor.com  Thu May  4 18:16:29 2006
From: mark at wormgoor.com (Mark Wormgoor)
Date: Thu, 04 May 2006 20:16:29 +0200
Subject: [Linux-cluster] Sharing disk using gnbd
Message-ID: <445A44FD.4040604@wormgoor.com>

Hi,

I have a small network with 3 machines. All machines are FC5 as of 
yesterday. One machine is a server and has most of my storage. I'm 
currently sharing the disks using NFS, but am researching better ways of 
sharing my disks. My main reason for doing this is that I would like 
Posix semantics, but better performance over NFS would be a nice benefit.

I think my main options are gnbd (GFS), iscsi and ata-over-ethernet. 
Since GFS is best supported in Fedora, that was my first attempt. 
However, when going through the docs, I noticed that I could not mount 
the disk on the server itsself.
1. If you use GFS on the disk and mount it like that on the server, you 
have to share it using gnbd with nocache, which is a huge performance hit.
2. According to the gnbd docs, you should never import the disks on the 
machine they are exported on, so that's out as well.
Can this be true? Is gnbd unusable if you want to use the disk on the 
server? On the other hand, GFS is a bit overkill, since I don't need the 
clustering; I just want to share my disk.

However, for aoe and iscsi, I think there is no way of sharing the file 
system between multiple systems, which would make them unusable. 
Besides, I could not find rpms for aoe, and for iscsi I could only find 
the server rpm, not the client.

Is there a solution? Am I stuck with NFS?

Kind regards,

Mark



From saju8 at rediffmail.com  Thu May  4 19:06:51 2006
From: saju8 at rediffmail.com (saju  john)
Date: 4 May 2006 19:06:51 -0000
Subject: [Linux-cluster] Centralized Cron 
Message-ID: <20060504190651.30345.qmail@webmail10.rediffmail.com>

  

Dear All,

Is there any way to make a centalized cron while using Redhat HA cluster with Sahred storage. I mean to put the crontab entry for a particular user on shared storage, so that when the cluster shifts, on the other node cron should read from the cron file in shared storage.

This setup has the advantage that we don't need to manullay update the cron entry in both nodes.

I tried two ways , but not success

a) Make a soft link from /var/spool/cron/<user> to /path/to/shared/storage/<user>. This will work as long as I didn't make any changes to existing crontab. Once I make changes to crontab, the link is removed and file is created at /var/spool/cron/<user>

b) Soft link the cron directory in /var/spool to /path/to/shared/storage/cron. This is working till the cluster shift. The cron is getting dead when the cluster shifts as it lose the /var/spool/cron link's destination driectory which will be mapped to the other node

Thanks in advance

Saju John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060504/d65a6de9/attachment.htm>

From Bowie_Bailey at BUC.com  Thu May  4 19:31:49 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Thu, 4 May 2006 15:31:49 -0400 
Subject: [Linux-cluster] Sharing disk using gnbd
Message-ID: <4766EEE585A6D311ADF500E018C154E3026849A1@bnifex.cis.buc.com>

Mark Wormgoor wrote:
> Hi,
> 
> I have a small network with 3 machines. All machines are FC5 as of
> yesterday. One machine is a server and has most of my storage. I'm
> currently sharing the disks using NFS, but am researching better ways
> of sharing my disks. My main reason for doing this is that I would
> like Posix semantics, but better performance over NFS would be a nice
> benefit. 
> 
> I think my main options are gnbd (GFS), iscsi and ata-over-ethernet.
> Since GFS is best supported in Fedora, that was my first attempt.
> However, when going through the docs, I noticed that I could not mount
> the disk on the server itsself.
> 1. If you use GFS on the disk and mount it like that on the server,
> you have to share it using gnbd with nocache, which is a huge
> performance hit. 
> 2. According to the gnbd docs, you should never import the disks on
> the machine they are exported on, so that's out as well.
> Can this be true? Is gnbd unusable if you want to use the disk on the
> server? On the other hand, GFS is a bit overkill, since I don't need
> the clustering; I just want to share my disk.
> 
> However, for aoe and iscsi, I think there is no way of sharing the
> file system between multiple systems, which would make them unusable.
> Besides, I could not find rpms for aoe, and for iscsi I could only
> find the server rpm, not the client.

You DO need the clustering.  That is what you are doing with
GNBD/iSCSI/AoE.  You are allowing multiple computers to read/write
directly to the storage media.  This requires GFS and a cluster to
manage access and prevent the storage from becoming corrupted.  With
NFS, the clients only access the storage through the NFS server, so do
not need this.

AoE and iSCSI can be natively shared with as many computers as you can
connect up to the storage network.  I don't know where you can find
the iSCSI drivers, but for AoE, you can get them from
http://www.coraid.com/support/linux/.  It's not an rpm, but a small,
easily compiled tarball.  There may be an rpm version somewhere, but
I've always just compiled it myself.

I can't comment on the limitations of GNBD.  I've never used it
myself, so I'm not sure.

-- 
Bowie



From eric at bootseg.com  Thu May  4 19:43:36 2006
From: eric at bootseg.com (Eric Kerin)
Date: Thu, 04 May 2006 15:43:36 -0400
Subject: [Linux-cluster] Centralized Cron
In-Reply-To: <20060504190651.30345.qmail@webmail10.rediffmail.com>
References: <20060504190651.30345.qmail@webmail10.rediffmail.com>
Message-ID: <1146771816.3407.24.camel@auh5-0479.corp.jabil.org>

On Thu, 2006-05-04 at 19:06 +0000, saju john wrote:
>   
> 
> Dear All,
> 
> Is there any way to make a centalized cron while using Redhat HA
> cluster with Sahred storage. I mean to put the crontab entry for a
> particular user on shared storage, so that when the cluster shifts, on
> the other node cron should read from the cron file in shared storage.
> 
> This setup has the advantage that we don't need to manullay update the
> cron entry in both nodes.
> 
> I tried two ways , but not success
> 
What I'm currently doing is creating wrapper scripts that check to see
if the clustered filesystem is mounted, then if it does, execute the
job.  This script is then placed in crontab.

The downside is that I have to update the crontab on all cluster nodes,
as well as copy the wrapper script to each node.


I've been toying with the idea of making an rgmanager aware cron, but
haven't worked out enough details of how it would work to write
something up for comments.


Thanks, 
Eric Kerin
eric at bootseg.com



From herta.vandeneynde at cc.kuleuven.be  Thu May  4 23:25:59 2006
From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde)
Date: Fri, 05 May 2006 01:25:59 +0200
Subject: [Linux-cluster] umount failed - device is busy
In-Reply-To: <4459AC85.7020308@cc.kuleuven.be>
References: <434A7ADE.108@cc.kuleuven.be>	<434A8FE6.40508@cc.kuleuven.be>	<1128963722.4680.21.camel@ayanami.boston.redhat.com>	<434AC9DE.50606@cc.kuleuven.be>	<1128978146.4680.37.camel@ayanami.boston.redhat.com>	<434ADB8C.9010508@cc.kuleuven.be>	<1129043197.4680.85.camel@ayanami.boston.redhat.com>	<434BDECD.2060303@cc.kuleuven.be>	<1129054711.4680.119.camel@ayanami.boston.redhat.com>	<434C0FA7.9000803@cc.kuleuven.be>
	<4459AC85.7020308@cc.kuleuven.be>
Message-ID: <445A8D87.5030900@cc.kuleuven.be>

Herta Van den Eynde wrote:
> Herta Van den Eynde wrote:
> 
>> Lon Hohberger wrote:
>>
>>> On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote:
>>>
>>>
>>>> Bit of extra information:  the system that was running the services 
>>>> got STONITHed by the other cluster member shortly before midnight.
>>>> The services all failed over nicely, but the situation remains:  if 
>>>> I try to stop or relocate a service, I get a "device is busy".
>>>> I suppose that rules out an intermittent issue.
>>>>
>>>> There's no mounts below mounts.
>>>
>>>
>>>
>>>
>>> Drat.
>>>
>>> Nfsd is the most likely candidate for holding the reference.
>>> Unfortunately, this is not something I can track down; you will have to
>>> either file a support request and/or a Bugzilla.  When you get a chance,
>>> you should definitely try stopping nfsd and seeing if that clears the
>>> mystery references (allowing you to unmount).  If the problem comes from
>>> nfsd, it should not be terribly difficult to track down.
>>>
>>> Also, you should not need to recompile your kernel to probe all the LUNs
>>> per device; just edit /etc/modules.conf:
>>>
>>> options scsi_mod max_scsi_luns=128
>>>
>>> ... then run mkinitrd to rebuild the initrd image.
>>>
>>> -- Lon
>>
>>
>> Next maintenance window is 4 weeks away, so I won't be able to test 
>> the nfsd hypothesis anytime soon.  In the meantime, I'll file a 
>> support request.  I'll keep you posted.
>>
>> At least the unexpected STONITH confirms that the failover still works.
>>
>> The /etc/modules.conf tip is a big time saver.  Rebuilding the modules 
>> takes forever.
>>
>> Thanks, Lon.
>>
>> Herta
> 
> 
> Apologies for not updating this sooner.  (Thanks for remindeing me, Owen.)
> 
> During a later maintenance window, I shut down the cluster services, but 
> it wasn't until I stopped the nfsd, that the filesystems could actually 
> be unmounted, which seems to confirm Lon's theory about nfsd being the 
> likely candidate for holding the reference.
> 
> I found a note elsewhere on the web where someone worked around the 
> problem by stopping nfsd, stopping the service, restarting nfsd, and 
> relocating the service.  Disadvantage being that all nfs services 
> experience a minor interrupt at the time.
> 
> Anyway, my problem disappeared during the latest maintenance window. 
> Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL -> 
> nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so 
> I'm not 100% sure which of the two fixed it, and curious though I am, I 
> simply don't have the time to start reading the code.  If anyone has 
> further insights, I'd love to read about it, though.
> 
> Kind regards,
> 
> Herta

Someone reported off line that they are experiencing the same problem 
while running the same versions we currently are.

So just for completeness sake: expecting problems, I also upped the 
clumanager log levels during the last maintenance window.  They are now at:
    clumembd   loglevel="6"
    cluquorumd loglevel="6"
    clurmtabd  loglevel="7"
    clusvcmgrd loglevel="6"
    clulockd   loglevel="6"

Come to think of it, I probably loosened the log levels during the
maintenance window when our problems began (I wanted to reduce the size
of the logs).  Not sure how - or even if - this might affect things, though.

Kind regards,

Herta

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From uli.schroeder at gmx.net  Fri May  5 06:25:25 2006
From: uli.schroeder at gmx.net (Uli Schroeder)
Date: Fri, 5 May 2006 08:25:25 +0200 (MEST)
Subject: [Linux-cluster] GFS: assertion "x <= length" failed
Message-ID: <8991.1146810325@www070.gmx.net>

Hi everyone,

I encounter the following problem with GFS unter RHEL4. Anyone familiar with
this one.

Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: fatal:
assertion "x <= length" failed
Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1:   function =
blkalloc_internal
Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1:   file =
/usr/src/build/574066-ia64/BUILD/gfs-kernel-2.6.9-35/src/gfs/rgrp.c, line =
1450
Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1:   time =
1146227066
Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: about to
withdraw from the cluster
Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: waiting for
outstanding I/O

What can I do against it? Anytime the error occurs all I get when I try to
access a directory on that volume is an "Input/Output error". The failure
occurs regularly and can only be resolved by booting the system.

Interestingly the error doesn't apply to all GFS volumes on a server. One
volume regularly fails while the other is up and running all the time. There
was no difference in setting them up.

Anyway the could be observed on different servers.

Best regards,
Uli

-- 
Echte DSL-Flatrate dauerhaft f?r 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl



From cjk at techma.com  Fri May  5 12:50:50 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Fri, 5 May 2006 08:50:50 -0400
Subject: [Linux-cluster] Recommended HP servers for cluster suite
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7A@tmaemail.techma.com>

iLO fencing works just fine. You might be missing perl-Crypt-SSLeay which is
required
for iLO fencing. You need to put all the fence_ilo options in the "fence.ccs"
and the
option ' action = reboot (or off) for in the fence section of your nodes.ccs
file on 
RHEL3+GFS6.0x. If you are using RHEL4 + GFS 6.1, then it is simpler since the
config
is expected to be in the same file etc.

In either case, you need to make sure the web access to the iLO port is
working and that
you have a valid account in the iLO config (the built in Administrator
account will work)
Also, if you are using RHEL4 and an updated iLO firmware, you need to disable
power 
management for the machine due to a change in the way the iLO powers off the
machine. It
seems to try a nice shutdown by sending the machine into runlevel 6 instead
of just pulling
the carpet out from under it.

My suggestion to the fencing agent coders would be to issue a "power reset"
instead of a 
"power off" as a reset will in fact pull the plug, and is much faster (and
thereby "safer")
than a power off command.


Also, you can always add more than one fencing method to each node. For
instace, you can
fence the machine at the fibre port as well. I believe you need to manually
enable the port
again once you have determined that there is no problem etc.


Any specific problem you are having?



Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cosimo Streppone
Sent: Wednesday, May 03, 2006 6:53 AM
To: linux clustering
Subject: Re: [Linux-cluster] Recommended HP servers for cluster suite

Steve Nelson wrote:

> On 5/3/06, carlopmart <carlopmart at gmail.com> wrote:
> 
>> And what about Porliant DL 380??
> 
> I use dl380s for low-end, dl580s and now 585s for upper-end clusters. 
> Very very happy with them.

Do you use iLO port for fencing?
Please can you explain your iLO configuration?
I have some doubts on how to configure fencing.

Example:
you have nodes A,B and iLO devices Ai, Bi Fencing device for node A should be
Ai or Bi?

I had also troubles on installation of additional perl modules required to
make fence_ilo agent work:
despite having IO::Socket::SSL and Net::SSL::something correctly installed,
it keeps throwing error messages and I can't seem to correctly startup
fenced.

--
Cosimo

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From proftpd at rodriges.spb.ru  Sat May  6 14:36:14 2006
From: proftpd at rodriges.spb.ru (proftpd at rodriges.spb.ru)
Date: Sat, 06 May 2006 18:36:14 +0400
Subject: [Linux-cluster] mount at other disk
Message-ID: <web-17555383@eltel.net>

Hello.

I'm using Vtrack as iSCSI target and 2 RHEL4 hosts as iSCSI
initiators. 2 RHEL in a cluster and have /dev/sda as iSCSI
attached targed. I make CLVM2
#pvcreate /dev/sda
#vgcreate test /dev/sda
#lvcreate -n test -L10G test

and GFS
#gfs_mkfs -p lock_dlm -t alpha:a -j 8 /dev/test/test

and successfully mount /dev/test/test at both machine. All
OK.

But then i'm increase the size of target at Vtrack. After I
remount iSCSI at 2 RHEL, i see that insteed /dev/sda target
become /dev/sdb!!! Of course, LVM2 wants to see /dev/sda as
PV. So I can't use data. 

What can I do to mount iSCSI targer always as /dev/sda?




From Jon.Stanley at savvis.net  Sun May  7 03:05:35 2006
From: Jon.Stanley at savvis.net (Stanley, Jon)
Date: Sat, 6 May 2006 22:05:35 -0500
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B901CE00B9@s228130hz1ew08.apptix-01.savvis.net>

 

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bowie Bailey
> Sent: Wednesday, May 03, 2006 11:08 AM
> To: linux clustering
> Subject: RE: [Linux-cluster] RE: < fecing with out any hardware? >
> > 
> > This script can be an automatic login to the failed server (ssh,
> > rlogin, serial console) which can execute any remote operation (for
> > example unload the module of the SAN-device) or causing an kernel
> > panic (which is the fencing-method in ocfs2 ;-) ).
> > 

If you have such a script, it cannot be guaranteed to be successful.  If
the server is so misbehaving that it will not respond to ssh, then all
bets are off and this will never succeed.

The question that I have is that there is functionality in the SCSI-3
spec for Persistent Group Reservations.  Basically, what happens is that
each system that wants access to a disk puts a "reservation" and
"registration" on it.  A commercial clustering solution (Symantec) uses
this feature in order to do it's I/O fencing.

The initial reservation on the disk is "Write Exclusive Registrants
Only", meaning that if you are not registered to be on the disk, you
cannot write to it.  When the node comes up, upon synchronizing with all
of the other nodes, etc, it puts it's key onto the disk.  It can then
write to the disk, without any problem.  When the node dies, the
surviving node(s) see that, and eject the dead node, making it
physically impossible to write to the disk.

This of course requires support from the array to do it (it's a SCSI-3
standard, but not all arrays implement it), thereby limiting the choice
of storage to mid-to-high-end enterprise arrays.  

The question is why can't we use that as a fence mechanism, and do away
with the hardware poweroff stuff, if the array supports it?  Of course
the hardware poweroff stuff could be left in for older/lower end arrays,
etc, but I think that options are a Good Thing(TM).



From hirantha at vcs.informatics.lk  Mon May  8 04:54:03 2006
From: hirantha at vcs.informatics.lk (Hirantha Wijayawardena)
Date: Mon, 8 May 2006 10:54:03 +0600
Subject: [Linux-cluster] Recommended HP servers for cluster suite
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7A@tmaemail.techma.com>
Message-ID: <20060508045815.58D7F27C43@ux-mail.informatics.lk>

Hi Corey

I have something to get clarify with you about 'Disabling the Power
Management' (I'm not quite sure whether my question is lacks my knowledge on
HP servers)

If you disable the power Management, is it possible to boot/reboot/shutdown
the server from web-based SIM utility? I believe there should be interaction
between iLO and Power Management.

I hope my second question is posted and replied already - rebooting Vs power
off. Sending node to runlevel 6 - What if server hung long time before
network service down! Will other node wait or takeover the package ran on
that hung server - this will crash right?

Thanks in advance

- Hirantha




-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
Sent: Friday, May 05, 2006 6:51 PM
To: linux clustering
Subject: RE: [Linux-cluster] Recommended HP servers for cluster suite

iLO fencing works just fine. You might be missing perl-Crypt-SSLeay which is
required
for iLO fencing. You need to put all the fence_ilo options in the
"fence.ccs"
and the
option ' action = reboot (or off) for in the fence section of your nodes.ccs
file on 
RHEL3+GFS6.0x. If you are using RHEL4 + GFS 6.1, then it is simpler since
the
config
is expected to be in the same file etc.

In either case, you need to make sure the web access to the iLO port is
working and that
you have a valid account in the iLO config (the built in Administrator
account will work)
Also, if you are using RHEL4 and an updated iLO firmware, you need to
disable
power 
management for the machine due to a change in the way the iLO powers off the
machine. It
seems to try a nice shutdown by sending the machine into runlevel 6 instead
of just pulling
the carpet out from under it.

My suggestion to the fencing agent coders would be to issue a "power reset"
instead of a 
"power off" as a reset will in fact pull the plug, and is much faster (and
thereby "safer")
than a power off command.


Also, you can always add more than one fencing method to each node. For
instace, you can
fence the machine at the fibre port as well. I believe you need to manually
enable the port
again once you have determined that there is no problem etc.


Any specific problem you are having?



Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cosimo Streppone
Sent: Wednesday, May 03, 2006 6:53 AM
To: linux clustering
Subject: Re: [Linux-cluster] Recommended HP servers for cluster suite

Steve Nelson wrote:

> On 5/3/06, carlopmart <carlopmart at gmail.com> wrote:
> 
>> And what about Porliant DL 380??
> 
> I use dl380s for low-end, dl580s and now 585s for upper-end clusters. 
> Very very happy with them.

Do you use iLO port for fencing?
Please can you explain your iLO configuration?
I have some doubts on how to configure fencing.

Example:
you have nodes A,B and iLO devices Ai, Bi Fencing device for node A should
be
Ai or Bi?

I had also troubles on installation of additional perl modules required to
make fence_ilo agent work:
despite having IO::Socket::SSL and Net::SSL::something correctly installed,
it keeps throwing error messages and I can't seem to correctly startup
fenced.

--
Cosimo

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From knelson at glasshouse.com  Mon May  8 10:15:31 2006
From: knelson at glasshouse.com (Kevin Nelson)
Date: Mon, 8 May 2006 11:15:31 +0100
Subject: [Linux-cluster] CLVMD
Message-ID: <C35C937A8C19E2438EEB7E6E6EFA18930AC84A@GHUK-EMAIL.uk.glasshousetech.com>

Setting up a cluster using RedHat ES4 Update 3. Installed cluster software (now running system config cluster 1.0.25), then installed GFS (6.1). Device mapper and LVM2 were part of the RedHat install. I have a cluster up and running, I can create a volume group, a logical volume and then a GFS volume. Read write fine. What I cannot do is share it in the cluster, CLVMD is not installed, the only information I can find to install is as part of the LVM2 make which allows you to make LVM2 with cluster options and CLVM but this fails on the ./configure 

Would appreciate any help if possible or if you need any further information let me know.
Thank you

Kevin Nelson
Systems Integration Consultant

GlassHouse Technologies (UK)
THE GLOBAL LEADER IN INDEPENDENT STORAGE SERVICES
Tel:    +44 1932 428812
Mob:  +44 7767 302108
Fax:   +44 2392 498853

http://www.glasshouse.com
 
Mailto:knelson at glasshouse.com
  
This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure.  If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please accept our apology. We should be obliged if you would telephone the sender on the above number or email them by return. Thank you. 
 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060508/0fcbf6c5/attachment.htm>

From pcaulfie at redhat.com  Mon May  8 12:10:11 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 08 May 2006 13:10:11 +0100
Subject: [Linux-cluster] CLVMD
In-Reply-To: <C35C937A8C19E2438EEB7E6E6EFA18930AC84A@GHUK-EMAIL.uk.glasshousetech.com>
References: <C35C937A8C19E2438EEB7E6E6EFA18930AC84A@GHUK-EMAIL.uk.glasshousetech.com>
Message-ID: <445F3523.2080109@redhat.com>

Kevin Nelson wrote:
> Setting up a cluster using RedHat ES4 Update 3. Installed cluster
> software (now running system config cluster 1.0.25), then installed GFS
> (6.1). Device mapper and LVM2 were part of the RedHat install. I have a
> cluster up and running, I can create a volume group, a logical volume
> and then a GFS volume. Read write fine. What I cannot do is share it in
> the cluster, CLVMD is not installed, the only information I can find to
> install is as part of the LVM2 make which allows you to make LVM2 with
> cluster options and CLVM but this fails on the ./configure
> 
> Would appreciate any help if possible or if you need any further
> information let me know.
> Thank you
> 

If you just need clvmd then you'll find it in the lvm2-cluster package.

If you really want to build it from sources then you'll need to post the
configure errors here. I suspect it's just some dependant packages that are
missing.
-- 

patrick



From lhh at redhat.com  Mon May  8 13:49:21 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 08 May 2006 09:49:21 -0400
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
In-Reply-To: <2C04D2F14FD8254386851063BC2B67065E08B4@STBEVS01.stb.sun.ac.za>
References: <4766EEE585A6D311ADF500E018C154E30213398C@bnifex.cis.buc.com>
	<2C04D2F14FD8254386851063BC2B67065E08B4@STBEVS01.stb.sun.ac.za>
Message-ID: <1147096161.11396.29.camel@ayanami.boston.redhat.com>

Sorry for the late response.

On Mon, 2006-05-01 at 21:45 +0200, Pool Lee, Mr <14117614 at sun.ac.za>
wrote:
> Hi..
> 
> What about software fencing? Is it really nesasary to be hardware!

Fencing basically is using a device which is not directly controlled by
cluster nodes to ensure a given node is cut off from performing I/O,
thereby corrupting shared data.

> Is there a difference between lutre/cfs, the product that sun uses, and gfs?

I have not read much about Sun's product(s), but GFS is significantly
different architecturally from Lustre.

http://lustre.org/architecture.html

GFS has no metadata or data servers per se (though, when using gulm, you
have a 'lock' server); all nodes are accessing the same block devices
directly.


> I'm planning to do mostly numerical work with the cluster and thus I would like all the machines to be able to 
> retrieve data, as if it was local on the machine. NFS is very limited in this regard because we intend on using vast arrays of matrices, that can be up to 1-2 Gig.

You can use GFS and export the same NFS volume from multiple servers if
you need to, which helps eliminate the single-NFS-server bottleneck.
(In this case, you only need to set up fencing for the GFS cluster.)

Or, you can connect all the nodes in your cluster to the same block
devices and use GFS directly.


> I was hoping to implement GFS since all the machines are already setup, without the hardware fencing though.

Well, you /can/ do this, but if a node hangs and comes back to life,
plan on rebooting the entire cluster, recreating the file system from
scratch, and restarting your calculations.

-- Lon



From lhh at redhat.com  Mon May  8 14:11:13 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 08 May 2006 10:11:13 -0400
Subject: [Linux-cluster] HP msa20 and rhcs?
In-Reply-To: <1333.85.101.156.147.1146510678.squirrel@85.101.156.147>
References: <1333.85.101.156.147.1146510678.squirrel@85.101.156.147>
Message-ID: <1147097473.11396.46.camel@ayanami.boston.redhat.com>

On Mon, 2006-05-01 at 22:11 +0300, Omer Faruk Sen wrote:
> Hi,
> 
> I need a cheap shared storage and wanted to know if anyone in this list
> used HP MSA20  (shared SCSI) on rhcs? I want to setup a 2 node cluster
> with shared storage and cheapest HP shared storage seems to be MSA20..

I've never used one; maybe someone else has.

General rules of thumb when using SCSI shared storage:

* If it requires a specific controller to make the RAID work, it
probably is not a good bet, regardless of what the marketing literature
would have you believe.

* If it works with a plain-jane SCSI (ex- an Adaptec 2940U2W) card and
still has some way of accessing the array management tools (ex: a serial
port) for configuring/presenting the LUNs, it should generally "just
work".

There are undoubtedly exceptions to these rules.  I'm not at all
familiar with the MSA20.

I do, however, have a MSA500 which has been working fine.  The MSA500
needs CCISS controllers to talk to the on-board MSA controller and
configure the LUNs during bootup.  After that, the CCISS controllers act
as "dumb" SCSI cards when talking to the MSA500, or so it seems.  It's
been working fine in a 2-node failover cluster for a couple of years,
but I have not tried it with GFS.

Whether that means the MSA20 will work... I do not know.  It might ;)

-- Lon



From teigland at redhat.com  Mon May  8 14:19:02 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 8 May 2006 09:19:02 -0500
Subject: [Linux-cluster] RE: < fecing with out any hardware? >
In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B901CE00B9@s228130hz1ew08.apptix-01.savvis.net>
References: <9A6FE0FCC2B29846824C5CD81C6647B901CE00B9@s228130hz1ew08.apptix-01.savvis.net>
Message-ID: <20060508141902.GB21898@redhat.com>

On Sat, May 06, 2006 at 10:05:35PM -0500, Stanley, Jon wrote:
> The question that I have is that there is functionality in the SCSI-3
> spec for Persistent Group Reservations.  Basically, what happens is that
> each system that wants access to a disk puts a "reservation" and
> "registration" on it.  A commercial clustering solution (Symantec) uses
> this feature in order to do it's I/O fencing.
> 
> The initial reservation on the disk is "Write Exclusive Registrants
> Only", meaning that if you are not registered to be on the disk, you
> cannot write to it.  When the node comes up, upon synchronizing with all
> of the other nodes, etc, it puts it's key onto the disk.  It can then
> write to the disk, without any problem.  When the node dies, the
> surviving node(s) see that, and eject the dead node, making it
> physically impossible to write to the disk.
> 
> This of course requires support from the array to do it (it's a SCSI-3
> standard, but not all arrays implement it), thereby limiting the choice
> of storage to mid-to-high-end enterprise arrays.  
> 
> The question is why can't we use that as a fence mechanism, and do away
> with the hardware poweroff stuff, if the array supports it?  Of course
> the hardware poweroff stuff could be left in for older/lower end arrays,
> etc, but I think that options are a Good Thing(TM).

You could definately use persistent reservations to do fencing, we just
don't have a fencing agent written for it yet.  It's one of those things
that no one ever quite gets the time to do.  It's something that would be
_really_ nice to have and would spare a lot of people a lot of hassle.

Dave



From cjk at techma.com  Mon May  8 15:36:59 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Mon, 8 May 2006 11:36:59 -0400
Subject: [Linux-cluster] Recommended HP servers for cluster suite
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7D@tmaemail.techma.com>


I believe you can simply turn acpid off to disable the power management. 

As far as SIM and shutdown goes, you should still be able to. The power
management stuff is just that part that intercepts front panel power button
hits and sends the computer into shutdown instead of a hard power off. It's 
useful for headless machines so you don't have to remote login, or connect
a head just to power down a machine. The problem is that the iLO function 
for powering off the machine, is the equivelent of a button push, and
therefore
sends the machine into init 6.

As far as taking a while for the machine to shutdown, that's why in my
message
I suggested doing a "power reset" rather than an "power off" since a power
reset
actually pulls the carpet out from under the machine no matter what.


Regards


Corey



>I have something to get clarify with you about 'Disabling the Power
Management' (I'm not quite sure whether >my question is lacks my knowledge on
HP servers)

>If you disable the power Management, is it possible to boot/reboot/shutdown
the server from web-based SIM >utility? I believe there should be interaction
between iLO and Power Management.

>I hope my second question is posted and replied already - rebooting Vs power
off. Sending node to runlevel 6 >- What if server hung long time before
network service down! Will other node wait or takeover the package ran on
that hung server - this will crash right?



-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
Sent: Friday, May 05, 2006 6:51 PM
To: linux clustering
Subject: RE: [Linux-cluster] Recommended HP servers for cluster suite

iLO fencing works just fine. You might be missing perl-Crypt-SSLeay which is
required for iLO fencing. You need to put all the fence_ilo options in the
"fence.ccs"
and the
option ' action = reboot (or off) for in the fence section of your nodes.ccs
file on 
RHEL3+GFS6.0x. If you are using RHEL4 + GFS 6.1, then it is simpler 
RHEL3+since
the
config
is expected to be in the same file etc.

In either case, you need to make sure the web access to the iLO port is
working and that you have a valid account in the iLO config (the built in
Administrator account will work) Also, if you are using RHEL4 and an updated
iLO firmware, you need to disable power management for the machine due to a
change in the way the iLO powers off the machine. It seems to try a nice
shutdown by sending the machine into runlevel 6 instead of just pulling the
carpet out from under it.

My suggestion to the fencing agent coders would be to issue a "power reset"
instead of a
"power off" as a reset will in fact pull the plug, and is much faster (and
thereby "safer") than a power off command.


Also, you can always add more than one fencing method to each node. For
instace, you can
fence the machine at the fibre port as well. I believe you need to manually
enable the port
again once you have determined that there is no problem etc.


Any specific problem you are having?



Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cosimo Streppone
Sent: Wednesday, May 03, 2006 6:53 AM
To: linux clustering
Subject: Re: [Linux-cluster] Recommended HP servers for cluster suite

Steve Nelson wrote:

> On 5/3/06, carlopmart <carlopmart at gmail.com> wrote:
> 
>> And what about Porliant DL 380??
> 
> I use dl380s for low-end, dl580s and now 585s for upper-end clusters. 
> Very very happy with them.

Do you use iLO port for fencing?
Please can you explain your iLO configuration?
I have some doubts on how to configure fencing.

Example:
you have nodes A,B and iLO devices Ai, Bi Fencing device for node A should
be
Ai or Bi?

I had also troubles on installation of additional perl modules required to
make fence_ilo agent work:
despite having IO::Socket::SSL and Net::SSL::something correctly installed,
it keeps throwing error messages and I can't seem to correctly startup
fenced.

--
Cosimo

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From cjk at techma.com  Mon May  8 15:43:46 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Mon, 8 May 2006 11:43:46 -0400
Subject: [Linux-cluster] HP msa20 and rhcs?
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7E@tmaemail.techma.com>

The MSA20 is only a disk shelf. You'd need to have it conneted to a raid
controller
which is built into the DL360 and above, or simply access the individual
drives 
themselves. It does allow multi-initiator connections, but I think it's more
along 
the lines of having multiple paths to an MSA500 which is a two node non-fibre
SAN
if it can even be considered a SAN since no fibre is involved. It's not more
than
a high end external storage device.



Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Monday, May 08, 2006 10:11 AM
To: omer at faruk.net; linux clustering
Subject: Re: [Linux-cluster] HP msa20 and rhcs?

On Mon, 2006-05-01 at 22:11 +0300, Omer Faruk Sen wrote:
> Hi,
> 
> I need a cheap shared storage and wanted to know if anyone in this 
> list used HP MSA20  (shared SCSI) on rhcs? I want to setup a 2 node 
> cluster with shared storage and cheapest HP shared storage seems to be
MSA20..

I've never used one; maybe someone else has.

General rules of thumb when using SCSI shared storage:

* If it requires a specific controller to make the RAID work, it probably is
not a good bet, regardless of what the marketing literature would have you
believe.

* If it works with a plain-jane SCSI (ex- an Adaptec 2940U2W) card and still
has some way of accessing the array management tools (ex: a serial
port) for configuring/presenting the LUNs, it should generally "just work".

There are undoubtedly exceptions to these rules.  I'm not at all familiar
with the MSA20.

I do, however, have a MSA500 which has been working fine.  The MSA500 needs
CCISS controllers to talk to the on-board MSA controller and configure the
LUNs during bootup.  After that, the CCISS controllers act as "dumb" SCSI
cards when talking to the MSA500, or so it seems.  It's been working fine in
a 2-node failover cluster for a couple of years, but I have not tried it with
GFS.

Whether that means the MSA20 will work... I do not know.  It might ;)

-- Lon

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From cosimo at streppone.it  Mon May  8 19:31:14 2006
From: cosimo at streppone.it (Cosimo Streppone)
Date: Mon, 08 May 2006 21:31:14 +0200
Subject: [Linux-cluster] Recommended HP servers for cluster suite
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7A@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7A@tmaemail.techma.com>
Message-ID: <445F9C82.2030801@streppone.it>

Kovacs, Corey J. wrote:

> iLO fencing works just fine.
> [...]
 > If you are using RHEL4 + GFS 6.1, then it is simpler since the
> config is expected to be in the same file etc.
>
 > [...]

I seem to have got past the SSL modules installation,
so that is not the problem.

Thanks for sharing your experience, but I admit I still haven't
understood when fencing takes place. What are the conditions that
trigger fencing?

> Any specific problem you are having?

Yes.
The main problem is that I'm now beginning to find my way through RHCS4. :-)
Other random problems that I had:

- oom-killer kernel thread killed my ccs daemon, causing the
   entire two-node cluster to suddenly become unmanageable;
- start/stop of shared filesystem resources (SAN) is causing errors
   and is therefore not managed properly;
- don't know how to properly configure heartbeat;

I know these are not iLO problem. In fact, I'm trying to
solve one problem at a time, and don't know if iLO fencing
can be the cause of these problems.

I need to do some more researching. I'll be back with
more useful info.

-- 
Cosimo



From cjk at techma.com  Mon May  8 20:13:37 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Mon, 8 May 2006 16:13:37 -0400
Subject: [Linux-cluster] Recommended HP servers for cluster suite
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7F@tmaemail.techma.com>

Cosimo, fencing takes place any time a condition exists, where "the cluster" 
cannot communicate with a node, or cannot gaurentee the state of a particular

node. Other can surely do the question more justtice but in a nutshell that's
it. To test this, simply pull the netwrok cable from a node. The others will 
not be able to status it and it will get fenced. Same thing happens if you
call 'fence_node node01' or whatever your nodes are named. The machine will
actually be booted _twice_. Once from the command, then another when the 
cluster decides it's no longer talking. I think the fence command should at
least have an option to inform the cluster that a node was fenced but it's
not
a big deal.


If oom is killing your cluster nodes, I think your out of luck. GFS can
gobble 
memory from my experience. More is better..  Also, in GFS 6.0x there is a bug
that causes system ram to be exhausted by GFS locks. The newest release has a
tunable paramter "inoded_purge" which allows you to tune a periodic
percentage
of locks to try and purge. This helped me a LOT. I was having nodes hang cuz
nodes
could not fork.

BTW, if the GFS folks are reading this, I'd like ot make a suggestion. I have
not
gone code diving yet but it seems that if the mechanism for a node to respond
was
actually spawning a thread or something that required the system to be able
to fork
then systems that are starved of memory would indeed get fenced since the
"OK" response
would not get back to the cluster. I realize that doesn't FIX anything per
se' but 
it would prevent the system from hanging for any length of time.

On the start/stop of SAN resources, what exactly do you mean? It sounds like
you
are talking about what happens when qlogic drivers load and unload. If that's
the
case, you need to properly set up zoning on your fibre switch. The
load/unload of
the qlogic drivers causes a scsi reset to be sent along the bus, which in the
case
of fibre channel, is every device in the fibre mesh. You need to set up
individual 
zones for your storage ports, then zones which include the host ports, and
the storage
together. So on a 5 node cluster, you'd end up with 5 zones, one for storage,
and 4
host/storage combos, then make them all part of the active config. That way
any scsi 
resets are not seen by other nodes HBA's.  I had problems that were causeing
nodes
to go down due to lost connections to the storage from the scsi resets, not
good....


Heartbeat should not need any tweaking if everything else is working. Not to
say you
can't tune it to your situation, just that it should be fine with default
settings while
you get things stable.



Hope this helps


Corey



-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cosimo Streppone
Sent: Monday, May 08, 2006 3:31 PM
To: linux clustering
Subject: Re: [Linux-cluster] Recommended HP servers for cluster suite

Kovacs, Corey J. wrote:

> iLO fencing works just fine.
> [...]
 > If you are using RHEL4 + GFS 6.1, then it is simpler since the
> config is expected to be in the same file etc.
>
 > [...]

I seem to have got past the SSL modules installation, so that is not the
problem.

Thanks for sharing your experience, but I admit I still haven't understood
when fencing takes place. What are the conditions that trigger fencing?

> Any specific problem you are having?

Yes.
The main problem is that I'm now beginning to find my way through RHCS4. :-)
Other random problems that I had:

- oom-killer kernel thread killed my ccs daemon, causing the
   entire two-node cluster to suddenly become unmanageable;
- start/stop of shared filesystem resources (SAN) is causing errors
   and is therefore not managed properly;
- don't know how to properly configure heartbeat;

I know these are not iLO problem. In fact, I'm trying to solve one problem at
a time, and don't know if iLO fencing can be the cause of these problems.

I need to do some more researching. I'll be back with more useful info.

--
Cosimo

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jason at monsterjam.org  Tue May  9 01:32:13 2006
From: jason at monsterjam.org (Jason)
Date: Mon, 8 May 2006 21:32:13 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
Message-ID: <20060509013213.GA91908@monsterjam.org>

so still following instructions at 
http://www.gyrate.org/archives/9
im at the part that says

"# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"

in my config, I have the dell PERC 4/DC cards, and I believe the logical drive showed up as 
/dev/sdb 

so do I need to create a partition on this logical drive with fdisk first before I run

 ccs_tool create /root/cluster  /dev/sdb1

or am I totally off track here?

i did ccs_tool create /root/cluster /dev/sdb
and it seemed to work fine, but doesnt seem right..

Jason



From cosimo at streppone.it  Tue May  9 09:09:34 2006
From: cosimo at streppone.it (Cosimo Streppone)
Date: Tue, 09 May 2006 11:09:34 +0200
Subject: [Linux-cluster] Interesting cluster case after a node hardware
	failure
Message-ID: <44605C4E.7010103@streppone.it>

I'm through an interesting case that I don't fully understand.
I found some log messages that I never saw before, that are quite
worrying. I report them here for quick reference, then I'm going
to include full log extract with comments on what I think has happened.
Please correct me when I'm wrong.

This is on RedHat Enterprise ES 4 Update 3 with
RHCS 4, with hand-modified init scripts to make them work with CS4.

   node2 kernel: CMAN: too many transition restarts - will die
   node2 kernel: CMAN: we are leaving the cluster. Inconsistent cluster view
   node2 kernel: WARNING: dlm_emergency_shutdown
   node2 kernel: WARNING: dlm_emergency_shutdown
   node2 kernel: SM: 00000001 sm_stop: SG still joined
   node2 kernel: SM: 01000003 sm_stop: SG still joined
   node2 kernel: SM: 03000002 sm_stop: SG still joined
   node2 clurgmgrd[2820]: <warning> #67: Shutting down uncleanly
   node2 ccsd[2473]: Cluster manager shutdown.  Attemping to reconnect...
   node2 ccsd[2473]: Cluster is not quorate.  Refusing connection.
   node2 ccsd[2473]: Error while processing connect: Connection refused
   node2 ccsd[2473]: Invalid descriptor specified (-111).
   node2 ccsd[2473]: Someone may be attempting something evil.
   node2 ccsd[2473]: Error while processing get: Invalid request descriptor
   node2 ccsd[2473]: Invalid descriptor specified (-111).
   node2 ccsd[2473]: Someone may be attempting something evil.
   node2 ccsd[2473]: Error while processing get: Invalid request descriptor
   node2 ccsd[2473]: Invalid descriptor specified (-21).
   node2 ccsd[2473]: Someone may be attempting something evil.
   node2 ccsd[2473]: Error while processing disconnect: Invalid request descriptor
   node2 clurgmgrd: [2820]: <info> Executing /etc/rc.d/init.d/xinetd stop

Cluster is composed of two nodes (node1 and node2), two HP DL360 machines
with iLO devices configured for fencing but not connected for now.
There is one service only which has several shared resources attached
(fs, init scripts, and 1 ip address).

As I said, I attached an extract of "messages" log that shows a series
of events which led to malfunctioning clustered service.

Please can anyone shed some light on this?
Thank you for any suggestion.


Trace begins. Node1 dies of hardware failure.

---------8<--------------------8<---------------

   May  8 15:30:20 node2 kernel: CMAN: removing node node1 from the cluster : Missed too many heartbeats
   May  8 15:30:20 node2 fenced[2540]: node1 not a cluster member after 0 sec post_fail_delay
   May  8 15:30:20 node2 fenced[2540]: fencing node "node1"
   May  8 15:30:23 node2 fenced[2540]: agent "fence_ilo" reports: connect: No route to host at 
/opt/perl/lib/site_perl/5.8.6/linux/Net/SSL
   May  8 15:30:23 node2 fenced[2540]: fence "node1" failed

Fencing could never work here because iLO interface of node1
was down (and *not* connected, FWIW).

   [...]

   May  8 15:31:55 node2 kernel: CMAN: node node1 rejoining
   May  8 15:31:59 node2 clurgmgrd[2820]: <info> Magma Event: Membership Change
   May  8 15:31:59 node2 clurgmgrd[2820]: <info> State change: node1 DOWN
   May  8 15:31:59 node2 clurgmgrd: [2820]: <info> Executing /etc/rc.d/init.d/xinetd status

Cluster recognizes the node1 is down. Ok.
I didn't understand the "rejoining" though. Node1 was down.

   [...]

   May  8 15:36:40 node2 kernel: CMAN: too many transition restarts - will die
   May  8 15:36:40 node2 kernel: CMAN: we are leaving the cluster. Inconsistent cluster view
   May  8 15:36:40 node2 kernel: WARNING: dlm_emergency_shutdown
   May  8 15:36:40 node2 kernel: WARNING: dlm_emergency_shutdown
   May  8 15:36:40 node2 kernel: SM: 00000001 sm_stop: SG still joined
   May  8 15:36:40 node2 kernel: SM: 01000003 sm_stop: SG still joined
   May  8 15:36:40 node2 kernel: SM: 03000002 sm_stop: SG still joined
   May  8 15:36:40 node2 clurgmgrd[2820]: <warning> #67: Shutting down uncleanly
   May  8 15:36:40 node2 ccsd[2473]: Cluster manager shutdown.  Attemping to reconnect...
   May  8 15:36:40 node2 ccsd[2473]: Cluster is not quorate.  Refusing connection.
   May  8 15:36:40 node2 ccsd[2473]: Error while processing connect: Connection refused
   May  8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-111).
   May  8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil.
   May  8 15:36:40 node2 ccsd[2473]: Error while processing get: Invalid request descriptor
   May  8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-111).
   May  8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil.
   May  8 15:36:40 node2 ccsd[2473]: Error while processing get: Invalid request descriptor
   May  8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-21).
   May  8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil.
   May  8 15:36:40 node2 ccsd[2473]: Error while processing disconnect: Invalid request descriptor
   May  8 15:36:40 node2 clurgmgrd: [2820]: <info> Executing /etc/rc.d/init.d/xinetd stop

 From now on, all services are being shut down by the cluster resource manager daemon.
But what could have happened that triggered a `dlm_emergency_shutdown'?

   May  8 15:36:40 node2 ccsd[2473]: Cluster is not quorate.  Refusing connection.
   May  8 15:36:40 node2 ccsd[2473]: Error while processing connect: Connection refused
   May  8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-111).
   May  8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil.
   May  8 15:36:40 node2 ccsd[2473]: Error while processing get: Invalid request descriptor
   May  8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-111).
   May  8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil.
   May  8 15:36:40 node2 ccsd[2473]: Error while processing get: Invalid request descriptor
   May  8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-21).
   May  8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil.
   May  8 15:36:40 node2 ccsd[2473]: Error while processing disconnect: Invalid request descriptor

   [...]

All services were shut down, shared ip address was released
and SAN volume unmounted.

   May  8 15:37:26 node2 ccsd[2473]: Unable to connect to cluster infrastructure after 60 seconds.
   ...
   May  8 15:41:26 node2 ccsd[2473]: Unable to connect to cluster infrastructure after 300 seconds.
   ...

The morning after, the node2 was rebooted. The shut down is not clean, but rebooting has
restored the cluster in a consistent state.
Node1 is not accessible due to complete hardware failure.

   May  9 09:08:05 node2 fenced: Stopping fence domain:
   May  9 09:08:05 node2 fenced: shutdown failed
   May  9 09:08:05 node2 fenced: ESC[60G
   May  9 09:08:05 node2 fenced:
   May  9 09:08:05 node2 rc: Stopping fenced:  failed
   May  9 09:08:05 node2 lock_gulmd: Stopping lock_gulmd:
   May  9 09:08:05 node2 lock_gulmd: shutdown succeeded
   May  9 09:08:05 node2 lock_gulmd: ESC[60G
   May  9 09:08:05 node2 lock_gulmd:
   May  9 09:08:05 node2 rc: Stopping lock_gulmd:  succeeded
   May  9 09:08:05 node2 cman: Stopping cman:
   May  9 09:08:08 node2 cman: failed to stop cman failed
   May  9 09:08:08 node2 cman: ESC[60G
   May  9 09:08:08 node2 cman:
   May  9 09:08:08 node2 rc: Stopping cman:  failed
   May  9 09:08:08 node2 ccsd: Stopping ccsd:
   May  9 09:08:08 node2 ccsd[2473]: Stopping ccsd, SIGTERM received.
   May  9 09:08:09 node2 ccsd: shutdown succeeded
   May  9 09:08:09 node2 ccsd: ESC[60G
   May  9 09:08:09 node2 ccsd:
   May  9 09:08:09 node2 rc: Stopping ccsd:  succeeded
   May  9 09:08:09 node2 irqbalance: irqbalance shutdown succeeded
   May  9 09:08:09 node2 multipathd: mpath0: stop event checker thread
   May  9 09:08:09 node2 multipathd: multipathd shutdown succeeded
   May  9 09:08:09 node2 kernel: Kernel logging (proc) stopped.
   May  9 09:08:09 node2 kernel: Kernel log daemon terminating.
   May  9 09:08:10 node2 syslog: klogd shutdown succeeded
   May  9 09:08:10 node2 exiting on signal 15

---------8<--------------------8<---------------

End of trace.

?

-- 
Cosimo



From pcaulfie at redhat.com  Tue May  9 09:22:43 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 09 May 2006 10:22:43 +0100
Subject: [Linux-cluster] Interesting cluster case after a node hardware
	failure
In-Reply-To: <44605C4E.7010103@streppone.it>
References: <44605C4E.7010103@streppone.it>
Message-ID: <44605F63.8080404@redhat.com>

Cosimo Streppone wrote:
> I'm through an interesting case that I don't fully understand.
> I found some log messages that I never saw before, that are quite
> worrying. I report them here for quick reference, then I'm going
> to include full log extract with comments on what I think has happened.
> Please correct me when I'm wrong.
> 
> This is on RedHat Enterprise ES 4 Update 3 with
> RHCS 4, with hand-modified init scripts to make them work with CS4.
> 
>   node2 kernel: CMAN: too many transition restarts - will die
>   node2 kernel: CMAN: we are leaving the cluster. Inconsistent cluster view


This is a known bug and I'm currently testing a fix for it.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777
-- 

patrick



From omer at faruk.net  Tue May  9 11:04:33 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Tue, 9 May 2006 14:04:33 +0300 (EEST)
Subject: [Linux-cluster] HP msa20 and rhcs?
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7E@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E7E@tmaemail.techma.com>
Message-ID: <57497.193.140.74.2.1147172673.squirrel@193.140.74.2>

Since msa20 allows multi-initiator connections than I think it works with
dl380 since it has an external scsi port. Also I have heard that msa20 can
have an internal raid controller so ACU can configure it.


By the way I have doubts to use shared scsi in RHCS. Does anyone use it (I
am sure there are) and those who use it recommend it for a cheap but
RELIABLE cluster storage with RHCS?



> The MSA20 is only a disk shelf. You'd need to have it conneted to a raid
> controller
> which is built into the DL360 and above, or simply access the individual
> drives
> themselves. It does allow multi-initiator connections, but I think it's
> more
> along
> the lines of having multiple paths to an MSA500 which is a two node
> non-fibre
> SAN
> if it can even be considered a SAN since no fibre is involved. It's not
> more
> than
> a high end external storage device.
>
>
>
> Corey
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
> Sent: Monday, May 08, 2006 10:11 AM
> To: omer at faruk.net; linux clustering
> Subject: Re: [Linux-cluster] HP msa20 and rhcs?
>
> On Mon, 2006-05-01 at 22:11 +0300, Omer Faruk Sen wrote:
>> Hi,
>>
>> I need a cheap shared storage and wanted to know if anyone in this
>> list used HP MSA20  (shared SCSI) on rhcs? I want to setup a 2 node
>> cluster with shared storage and cheapest HP shared storage seems to be
> MSA20..
>
> I've never used one; maybe someone else has.
>
> General rules of thumb when using SCSI shared storage:
>
> * If it requires a specific controller to make the RAID work, it probably
> is
> not a good bet, regardless of what the marketing literature would have you
> believe.
>
> * If it works with a plain-jane SCSI (ex- an Adaptec 2940U2W) card and
> still
> has some way of accessing the array management tools (ex: a serial
> port) for configuring/presenting the LUNs, it should generally "just
> work".
>
> There are undoubtedly exceptions to these rules.  I'm not at all familiar
> with the MSA20.
>
> I do, however, have a MSA500 which has been working fine.  The MSA500
> needs
> CCISS controllers to talk to the on-board MSA controller and configure the
> LUNs during bootup.  After that, the CCISS controllers act as "dumb" SCSI
> cards when talking to the MSA500, or so it seems.  It's been working fine
> in
> a 2-node failover cluster for a couple of years, but I have not tried it
> with
> GFS.
>
> Whether that means the MSA20 will work... I do not know.  It might ;)
>
> -- Lon
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Omer Faruk Sen
http://www.faruk.net



From cjk at techma.com  Tue May  9 12:07:09 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Tue, 9 May 2006 08:07:09 -0400
Subject: [Linux-cluster] HP msa20 and rhcs?
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E80@tmaemail.techma.com>

No, it is an external disk shelf (enclosure) that happens to have a raid
(SATA) module. If the
unit is connected to a 6400 series HP controller, then the controller acts
like a dumb
scsi card, and the "onboard" raid is used. If it's plugged into an MSA1500,
then the onboard
controller is bypassed and the MSA1500 (san) is the controller. The Docs also
say it is
it's connectivity is for a _single_ host so no failover for a cluster. It's
not a good 
"enterprise" solution anyway as SATA drives are not considered a "high
performance" 
drive anwyay. Spend some cash and get a SAN and real SCSI drives. Spend some
more and 
get a bigger SAN and Fibre Attatched SCSI drives. much faster and can
actually take a 
prolonged beating.

I never said it didn't work with a DL380, I said it will work with a DL360
and above. I'd 
say DL380 is _above_ DL360.  :)

http://h18004.www1.hp.com/products/servers/proliantstorage/sharedstorage/sacl
uster/msa20/index.html

http://h18004.www1.hp.com/products/quickspecs/11942_na/11942_na.html



Anyway, it supports multiple "types" of raid cards and the starter kit gets
you a shelf and 
a raid card..  It is NOT meant for connecting two computers to..


Now back to the regularly scheduled program....


Corey

-----Original Message-----
From: Omer Faruk Sen [mailto:omer at faruk.net] 
Sent: Tuesday, May 09, 2006 7:05 AM
To: Kovacs, Corey J.
Cc: linux clustering
Subject: RE: [Linux-cluster] HP msa20 and rhcs?

Since msa20 allows multi-initiator connections than I think it works with
dl380 since it has an external scsi port. Also I have heard that msa20 can
have an internal raid controller so ACU can configure it.


By the way I have doubts to use shared scsi in RHCS. Does anyone use it (I am
sure there are) and those who use it recommend it for a cheap but RELIABLE
cluster storage with RHCS?



> The MSA20 is only a disk shelf. You'd need to have it conneted to a 
> raid controller which is built into the DL360 and above, or simply 
> access the individual drives themselves. It does allow multi-initiator 
> connections, but I think it's more along the lines of having multiple 
> paths to an MSA500 which is a two node non-fibre SAN if it can even be 
> considered a SAN since no fibre is involved. It's not more than a high 
> end external storage device.
>
>
>
> Corey
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
> Sent: Monday, May 08, 2006 10:11 AM
> To: omer at faruk.net; linux clustering
> Subject: Re: [Linux-cluster] HP msa20 and rhcs?
>
> On Mon, 2006-05-01 at 22:11 +0300, Omer Faruk Sen wrote:
>> Hi,
>>
>> I need a cheap shared storage and wanted to know if anyone in this 
>> list used HP MSA20  (shared SCSI) on rhcs? I want to setup a 2 node 
>> cluster with shared storage and cheapest HP shared storage seems to 
>> be
> MSA20..
>
> I've never used one; maybe someone else has.
>
> General rules of thumb when using SCSI shared storage:
>
> * If it requires a specific controller to make the RAID work, it 
> probably is not a good bet, regardless of what the marketing 
> literature would have you believe.
>
> * If it works with a plain-jane SCSI (ex- an Adaptec 2940U2W) card and 
> still has some way of accessing the array management tools (ex: a 
> serial
> port) for configuring/presenting the LUNs, it should generally "just 
> work".
>
> There are undoubtedly exceptions to these rules.  I'm not at all 
> familiar with the MSA20.
>
> I do, however, have a MSA500 which has been working fine.  The MSA500 
> needs CCISS controllers to talk to the on-board MSA controller and 
> configure the LUNs during bootup.  After that, the CCISS controllers 
> act as "dumb" SCSI cards when talking to the MSA500, or so it seems.  
> It's been working fine in a 2-node failover cluster for a couple of 
> years, but I have not tried it with GFS.
>
> Whether that means the MSA20 will work... I do not know.  It might ;)
>
> -- Lon
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
Omer Faruk Sen
http://www.faruk.net



From cjk at techma.com  Tue May  9 12:16:07 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Tue, 9 May 2006 08:16:07 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E81@tmaemail.techma.com>

Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. Do you
have a shared storage device? If /dev/sdb1 is not a shared device, then I
think
you might need to take a step back and get a hold of a SAN of some type. If
you 
are just playing around, there are ways to get some firewire drives to accept

two hosts and act like a cheap shared devices. There are docs on the Oracle 
site documenting the process of setting up the drive and the kernel. Note,
that
you'll only be able to use two nodes using the firewire idea.

Also, you should specify a partition for the command below. That partition
can
be very small. Something on the order of 10MB sounds right. Even that is
probably
way too big. Then use the rest for GFS storage pools.


Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
Sent: Monday, May 08, 2006 9:32 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] question about creating partitions and gfs

so still following instructions at
http://www.gyrate.org/archives/9
im at the part that says

"# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"

in my config, I have the dell PERC 4/DC cards, and I believe the logical
drive showed up as /dev/sdb 

so do I need to create a partition on this logical drive with fdisk first
before I run

 ccs_tool create /root/cluster  /dev/sdb1

or am I totally off track here?

i did ccs_tool create /root/cluster /dev/sdb and it seemed to work fine, but
doesnt seem right..

Jason

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From hkubota at gmx.net  Tue May  9 14:52:46 2006
From: hkubota at gmx.net (Harald Kubota)
Date: Tue, 09 May 2006 23:52:46 +0900
Subject: [Linux-cluster] Centralized Cron
In-Reply-To: <20060504190651.30345.qmail@webmail10.rediffmail.com>
References: <20060504190651.30345.qmail@webmail10.rediffmail.com>
Message-ID: <4460ACBE.3070505@gmx.net>

saju john wrote:
>
>
> Is there any way to make a centalized cron
>

at work (more clusters, all Veritas though) we use autosys, which is 
basically
a program scheduler to start programs on other machines.
cron jobs are for jobs which are for each node (e.g. sending a health status
to a central server once a day), thus bound to a physical machine.
autosys is for everything which can move (cluster service groups, e.g. 
running
a DB cleanup job on the node which runs the DB). Since each
service group has its own IP address and DNS entry, autosys simply connect
to that IP address and executes a script (and does more like checking the
status, handling timeouts, sending out alarms if anything went wrong etc.).

Since autosys is commercial software and required quite some infrastructure,
a simpler approach is to set up one machine (maybe cluster it ;-)
which starts jobs on other machines according to a list it maintains.

Harald



From mwill at penguincomputing.com  Tue May  9 15:03:54 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Tue, 9 May 2006 08:03:54 -0700
Subject: [Linux-cluster] Centralized Cron
Message-ID: <433093DF7AD7444DA65EFAFE3987879C125CCF@jellyfish.highlyscyld.com>

Or use cron on the headnode to submit jobs to the clusterscheduler if that does not support recurring timed jobs...

 -----Original Message-----
From: 	Harald Kubota [mailto:hkubota at gmx.net]
Sent:	Tue May 09 07:53:08 2006
To:	linux clustering
Subject:	Re: [Linux-cluster] Centralized Cron

saju john wrote:
>
>
> Is there any way to make a centalized cron
>

at work (more clusters, all Veritas though) we use autosys, which is 
basically
a program scheduler to start programs on other machines.
cron jobs are for jobs which are for each node (e.g. sending a health status
to a central server once a day), thus bound to a physical machine.
autosys is for everything which can move (cluster service groups, e.g. 
running
a DB cleanup job on the node which runs the DB). Since each
service group has its own IP address and DNS entry, autosys simply connect
to that IP address and executes a script (and does more like checking the
status, handling timeouts, sending out alarms if anything went wrong etc.).

Since autosys is commercial software and required quite some infrastructure,
a simpler approach is to set up one machine (maybe cluster it ;-)
which starts jobs on other machines according to a list it maintains.

Harald

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060509/2ae5adb6/attachment.htm>

From nick at sqrt.co.uk  Tue May  9 16:25:19 2006
From: nick at sqrt.co.uk (Nick Burrett)
Date: Tue, 09 May 2006 09:25:19 -0700
Subject: [Linux-cluster] Centralized Cron
In-Reply-To: <20060504190651.30345.qmail@webmail10.rediffmail.com>
References: <20060504190651.30345.qmail@webmail10.rediffmail.com>
Message-ID: <4460C26F.6070108@sqrt.co.uk>

saju john wrote:
>  
> 
> Dear All,
> 
> Is there any way to make a centalized cron while using Redhat HA cluster 
> with Sahred storage. I mean to put the crontab entry for a particular 
> user on shared storage, so that when the cluster shifts, on the other 
> node cron should read from the cron file in shared storage.

If you want some form of high availability cron, you could try to 
leverage the Condor application to suit your needs.  If you link your 
cron applications against the Condor libraries, then you get process 
check pointing and all sorts of other wonderful stuff.


> This setup has the advantage that we don't need to manullay update the 
> cron entry in both nodes.

> b) Soft link the cron directory in /var/spool to 
> /path/to/shared/storage/cron. This is working till the cluster shift. 
> The cron is getting dead when the cluster shifts as it lose the 
> /var/spool/cron link's destination driectory which will be mapped to the 
> other node

You need to add in a heartbeat trigger.  The cron daemon runs on one 
server only.  When that server goes offline, then start the cron daemon 
on the backup server.  This is a terrible solution though.


Nick.



From jason at monsterjam.org  Wed May 10 00:23:12 2006
From: jason at monsterjam.org (Jason)
Date: Tue, 9 May 2006 20:23:12 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E81@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E81@tmaemail.techma.com>
Message-ID: <20060510002312.GA4927@monsterjam.org>

yes, both boxes are connected to the storage, its a dell powervault 220S
configured for cluster mode. 

[root at tf1 cluster]#  fdisk -l /dev/sdb

Disk /dev/sdb: 146.5 GB, 146548981760 bytes
255 heads, 63 sectors/track, 17816 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1      2433  19543041   83  Linux
[root at tf1 cluster]# 

[root at tf2 cluster]# fdisk -l /dev/sdb

Disk /dev/sdb: 146.5 GB, 146548981760 bytes
255 heads, 63 sectors/track, 17816 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1      2433  19543041   83  Linux
[root at tf2 cluster]# 


so both sides see the storage.  

on tf1, I can start ccsd fine, but on tf2, I cant, and I see
May  8 22:00:21 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address 
May  8 22:00:21 tf2 ccsd: startup failed
May  9 20:17:21 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address 
May  9 20:17:21 tf2 ccsd: startup failed
May  9 20:17:30 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address 
May  9 20:17:30 tf2 ccsd: startup failed
[root at tf2 cluster]# 

in the logs

Jason




On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote:
> Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. Do you
> have a shared storage device? If /dev/sdb1 is not a shared device, then I
> think
> you might need to take a step back and get a hold of a SAN of some type. If
> you 
> are just playing around, there are ways to get some firewire drives to accept
> 
> two hosts and act like a cheap shared devices. There are docs on the Oracle 
> site documenting the process of setting up the drive and the kernel. Note,
> that
> you'll only be able to use two nodes using the firewire idea.
> 
> Also, you should specify a partition for the command below. That partition
> can
> be very small. Something on the order of 10MB sounds right. Even that is
> probably
> way too big. Then use the rest for GFS storage pools.
> 
> 
> Corey
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> Sent: Monday, May 08, 2006 9:32 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] question about creating partitions and gfs
> 
> so still following instructions at
> http://www.gyrate.org/archives/9
> im at the part that says
> 
> "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"
> 
> in my config, I have the dell PERC 4/DC cards, and I believe the logical
> drive showed up as /dev/sdb 
> 
> so do I need to create a partition on this logical drive with fdisk first
> before I run
> 
>  ccs_tool create /root/cluster  /dev/sdb1
> 
> or am I totally off track here?
> 
> i did ccs_tool create /root/cluster /dev/sdb and it seemed to work fine, but
> doesnt seem right..
> 
> Jason
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================



From saju8 at rediffmail.com  Wed May 10 04:17:16 2006
From: saju8 at rediffmail.com (saju  john)
Date: 10 May 2006 04:17:16 -0000
Subject: [Linux-cluster] Centralized Cron
Message-ID: <20060510041716.8037.qmail@webmail50.rediffmail.com>

  
Dear All,

Thanks for all replay.

What i need exactly is How to make the Cron centralized and NOT how to make it not running on backup node.
Without having a centralized cron I need to edit the cron file in all nodes.This is the difficulty which I am facing.

Any suggession will be valuable.

Thank You,
Saju John



On Tue, 09 May 2006 Nick Burrett wrote :
>saju john wrote:
>>  
>>Dear All,
>>
>>Is there any way to make a centalized cron while using Redhat HA cluster with Sahred storage. I mean to put the crontab entry for a particular user on shared storage, so that when the cluster shifts, on the other node cron should read from the cron file in shared storage.
>
>If you want some form of high availability cron, you could try to leverage the Condor application to suit your needs.  If you link your cron applications against the Condor libraries, then you get process check pointing and all sorts of other wonderful stuff.
>
>
>>This setup has the advantage that we don't need to manullay update the cron entry in both nodes.
>
>>b) Soft link the cron directory in /var/spool to /path/to/shared/storage/cron. This is working till the cluster shift. The cron is getting dead when the cluster shifts as it lose the /var/spool/cron link's destination driectory which will be mapped to the other node
>
>You need to add in a heartbeat trigger.  The cron daemon runs on one server only.  When that server goes offline, then start the cron daemon on the backup server.  This is a terrible solution though.
>
>
>Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060510/510fb7fd/attachment.htm>

From cjk at techma.com  Wed May 10 12:30:58 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 10 May 2006 08:30:58 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E83@tmaemail.techma.com>

Jason, couple of questions.... (And I assume you are working with
RHEL3+GFS6.0x)


1. Are you actually using raw devices? if so, why? 
2. Does the device /dev/raw/raw64 actually exist on tf2?


GFS does not use raw devices for anything. The standard Redhat Cluster suite
does, but not GFS. GFS uses "storage pools".  Also, if memory servs me right,
later versions of GFS for RHEL3 need to be told what pools to use in the 
"/etc/sysconfig/gfs" config file. Used to be that GFS just did a scan and
"found" the pools, but no longer I believe.

Hope this helps. If not, can you give more details about your config? 



Corey


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
Sent: Tuesday, May 09, 2006 8:23 PM
To: linux clustering
Subject: Re: [Linux-cluster] question about creating partitions and gfs

yes, both boxes are connected to the storage, its a dell powervault 220S
configured for cluster mode. 

[root at tf1 cluster]#  fdisk -l /dev/sdb

Disk /dev/sdb: 146.5 GB, 146548981760 bytes
255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512
= 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1      2433  19543041   83  Linux
[root at tf1 cluster]# 

[root at tf2 cluster]# fdisk -l /dev/sdb

Disk /dev/sdb: 146.5 GB, 146548981760 bytes
255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512
= 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1      2433  19543041   83  Linux
[root at tf2 cluster]# 


so both sides see the storage.  

on tf1, I can start ccsd fine, but on tf2, I cant, and I see May  8 22:00:21
tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or
address May  8 22:00:21 tf2 ccsd: startup failed May  9 20:17:21 tf2 ccsd:
Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May  9
20:17:21 tf2 ccsd: startup failed May  9 20:17:30 tf2 ccsd: Unable to open
/dev/sdb1 (/dev/raw/raw64): No such device or address May  9 20:17:30 tf2
ccsd: startup failed
[root at tf2 cluster]# 

in the logs

Jason




On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote:
> Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. 
> Do you have a shared storage device? If /dev/sdb1 is not a shared 
> device, then I think you might need to take a step back and get a hold 
> of a SAN of some type. If you are just playing around, there are ways 
> to get some firewire drives to accept
> 
> two hosts and act like a cheap shared devices. There are docs on the 
> Oracle site documenting the process of setting up the drive and the 
> kernel. Note, that you'll only be able to use two nodes using the 
> firewire idea.
> 
> Also, you should specify a partition for the command below. That 
> partition can be very small. Something on the order of 10MB sounds 
> right. Even that is probably way too big. Then use the rest for GFS 
> storage pools.
> 
> 
> Corey
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> Sent: Monday, May 08, 2006 9:32 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] question about creating partitions and gfs
> 
> so still following instructions at
> http://www.gyrate.org/archives/9
> im at the part that says
> 
> "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"
> 
> in my config, I have the dell PERC 4/DC cards, and I believe the 
> logical drive showed up as /dev/sdb
> 
> so do I need to create a partition on this logical drive with fdisk 
> first before I run
> 
>  ccs_tool create /root/cluster  /dev/sdb1
> 
> or am I totally off track here?
> 
> i did ccs_tool create /root/cluster /dev/sdb and it seemed to work 
> fine, but doesnt seem right..
> 
> Jason
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From cjk at techma.com  Wed May 10 12:33:04 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 10 May 2006 08:33:04 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E84@tmaemail.techma.com>

Jason, I just realized what the problem is. You need to apply the config to a
"pool"
not a normal device.  What do your pooll definitions look like? The one you
created
for the config is where you need to point ccs_tool at to activate the
config...


Corey 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
Sent: Wednesday, May 10, 2006 8:31 AM
To: linux clustering
Subject: RE: [Linux-cluster] question about creating partitions and gfs

Jason, couple of questions.... (And I assume you are working with
RHEL3+GFS6.0x)


1. Are you actually using raw devices? if so, why? 
2. Does the device /dev/raw/raw64 actually exist on tf2?


GFS does not use raw devices for anything. The standard Redhat Cluster suite
does, but not GFS. GFS uses "storage pools".  Also, if memory servs me right,
later versions of GFS for RHEL3 need to be told what pools to use in the
"/etc/sysconfig/gfs" config file. Used to be that GFS just did a scan and
"found" the pools, but no longer I believe.

Hope this helps. If not, can you give more details about your config? 



Corey


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
Sent: Tuesday, May 09, 2006 8:23 PM
To: linux clustering
Subject: Re: [Linux-cluster] question about creating partitions and gfs

yes, both boxes are connected to the storage, its a dell powervault 220S
configured for cluster mode. 

[root at tf1 cluster]#  fdisk -l /dev/sdb

Disk /dev/sdb: 146.5 GB, 146548981760 bytes
255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512
= 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1      2433  19543041   83  Linux
[root at tf1 cluster]# 

[root at tf2 cluster]# fdisk -l /dev/sdb

Disk /dev/sdb: 146.5 GB, 146548981760 bytes
255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512
= 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1      2433  19543041   83  Linux
[root at tf2 cluster]# 


so both sides see the storage.  

on tf1, I can start ccsd fine, but on tf2, I cant, and I see May  8 22:00:21
tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or
address May  8 22:00:21 tf2 ccsd: startup failed May  9 20:17:21 tf2 ccsd:
Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May  9
20:17:21 tf2 ccsd: startup failed May  9 20:17:30 tf2 ccsd: Unable to open
/dev/sdb1 (/dev/raw/raw64): No such device or address May  9 20:17:30 tf2
ccsd: startup failed
[root at tf2 cluster]# 

in the logs

Jason




On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote:
> Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. 
> Do you have a shared storage device? If /dev/sdb1 is not a shared 
> device, then I think you might need to take a step back and get a hold 
> of a SAN of some type. If you are just playing around, there are ways 
> to get some firewire drives to accept
> 
> two hosts and act like a cheap shared devices. There are docs on the 
> Oracle site documenting the process of setting up the drive and the 
> kernel. Note, that you'll only be able to use two nodes using the 
> firewire idea.
> 
> Also, you should specify a partition for the command below. That 
> partition can be very small. Something on the order of 10MB sounds 
> right. Even that is probably way too big. Then use the rest for GFS 
> storage pools.
> 
> 
> Corey
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> Sent: Monday, May 08, 2006 9:32 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] question about creating partitions and gfs
> 
> so still following instructions at
> http://www.gyrate.org/archives/9
> im at the part that says
> 
> "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"
> 
> in my config, I have the dell PERC 4/DC cards, and I believe the 
> logical drive showed up as /dev/sdb
> 
> so do I need to create a partition on this logical drive with fdisk 
> first before I run
> 
>  ccs_tool create /root/cluster  /dev/sdb1
> 
> or am I totally off track here?
> 
> i did ccs_tool create /root/cluster /dev/sdb and it seemed to work 
> fine, but doesnt seem right..
> 
> Jason
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jason at monsterjam.org  Wed May 10 13:07:58 2006
From: jason at monsterjam.org (Jason)
Date: Wed, 10 May 2006 09:07:58 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E83@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E83@tmaemail.techma.com>
Message-ID: <20060510130758.GA48550@monsterjam.org>

On Wed, May 10, 2006 at 08:30:58AM -0400, Kovacs, Corey J. wrote:
> Jason, couple of questions.... (And I assume you are working with
> RHEL3+GFS6.0x)

[root at tf1 cluster]# cat /etc/redhat-release 
Red Hat Enterprise Linux AS release 3 (Taroon Update 7)
[root at tf1 cluster]# 

[root at tf1 cluster]# rpm -qa | grep -i gfs
GFS-modules-smp-6.0.2.30-0
GFS-devel-6.0.2.30-0
GFS-debuginfo-6.0.2.30-0
GFS-6.0.2.30-0
GFS-modules-6.0.2.30-0
[root at tf1 cluster]# 


> 
> 
> 1. Are you actually using raw devices? if so, why? 
not intentionally.. ;)

> 2. Does the device /dev/raw/raw64 actually exist on tf2?

[root at tf2 cluster]# !ls
ls -al /dev/raw/raw64
crw-rw----    1 root     disk     162,  64 Jun 24  2004 /dev/raw/raw64
[root at tf2 cluster]# 

[root at tf1 cluster]# !ls
ls -al /dev/raw/raw64
crw-rw----    1 root     disk     162,  64 Jun 24  2004 /dev/raw/raw64
[root at tf1 cluster]# 

so theyre both there..

> 
> 
> GFS does not use raw devices for anything. The standard Redhat Cluster suite
> does, but not GFS. GFS uses "storage pools".  Also, if memory servs me right,
> later versions of GFS for RHEL3 need to be told what pools to use in the 
> "/etc/sysconfig/gfs" config file. Used to be that GFS just did a scan and
> "found" the pools, but no longer I believe.
> 

in /etc/sysconfig/gfs on both boxes, I have

CCS_ARCHIVE="/dev/sdb1" (everything else is commented out)

regards,

Jason



From lhh at redhat.com  Wed May 10 14:15:38 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 10 May 2006 10:15:38 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
In-Reply-To: <20060509013213.GA91908@monsterjam.org>
References: <20060509013213.GA91908@monsterjam.org>
Message-ID: <1147270538.11396.72.camel@ayanami.boston.redhat.com>

On Mon, 2006-05-08 at 21:32 -0400, Jason wrote:
> so still following instructions at 
> http://www.gyrate.org/archives/9
> im at the part that says
> 
> "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"
> 
> in my config, I have the dell PERC 4/DC cards, and I believe the logical drive showed up as 
> /dev/sdb 
> 
> so do I need to create a partition on this logical drive with fdisk first before I run

Yes.

> i did ccs_tool create /root/cluster /dev/sdb
> and it seemed to work fine, but doesnt seem right..

Well, you could do that, but you're using the entire logical drive for
the configuration archive -- which is probably not what you want.

-- Lon




From stephen.willey at framestore-cfc.com  Tue May  9 13:14:38 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Tue, 09 May 2006 14:14:38 +0100
Subject: [Linux-cluster] GFS resiliancy
Message-ID: <446095BE.4010706@framestore-cfc.com>

If the GFS filesystem is built over several discs LVMed together, how
will it behave in the event of a failure of one of those discs (when
setup with a linear striped LVM)?

Is there any way of securing the filesystem against the failure of any
one physical volume?

Thanks,

Stephen



From stephen.willey at framestore-cfc.com  Tue May  9 13:18:12 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Tue, 09 May 2006 14:18:12 +0100
Subject: [Linux-cluster] Size limits of the various components
Message-ID: <44609694.7060609@framestore-cfc.com>

We're testing GFS on 64 bit servers/64 bit RHEL4 and need to know how
big LVM2 and GFS will scale.

Can anyone tell me the maximum sizes of these component parts:

GFS filesystem
(C)LVM2 logical volume
(C)LVM2 volume group
(C)LVM2 physical volumes

We're considering building a filesystem that may need to scale to 100Tb
or more and I've found various different answers on this list and elsewhere.

Stephen



From stephen.willey at framestore-cfc.com  Tue May  9 13:23:43 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Tue, 09 May 2006 14:23:43 +0100
Subject: [Linux-cluster] Slow dfs
Message-ID: <446097DF.4080005@framestore-cfc.com>

I saw a question a while back from Jeffrey Bethke about speeding up df
operations.

We're considering building a large GFS filesystem and the 11Tb
filesystem that we have now can take a very long time to return from a
df (either regular or using gfs_tool).

Once it's run once it appears to cache the information and runs quite
quickly, but we're concerned about how this will scale once we get up to
100 or so Tb.  Waiting ages for a df to return isn't ideal...

Stephen



From eric at bootseg.com  Wed May 10 15:59:56 2006
From: eric at bootseg.com (Eric Kerin)
Date: Wed, 10 May 2006 11:59:56 -0400
Subject: [Linux-cluster] GFS resiliancy
In-Reply-To: <446095BE.4010706@framestore-cfc.com>
References: <446095BE.4010706@framestore-cfc.com>
Message-ID: <1147276797.3533.6.camel@auh5-0479.corp.jabil.org>

On Tue, 2006-05-09 at 14:14 +0100, Stephen Willey wrote:
> If the GFS filesystem is built over several discs LVMed together, how
> will it behave in the event of a failure of one of those discs (when
> setup with a linear striped LVM)?
> 
> Is there any way of securing the filesystem against the failure of any
> one physical volume?
> 
With GFS this needs to be handled at the storage layer, you'll need a
storage subsystem that supports some form of RAID (4/5/1+0/etc) to keep
disk failures from destroying your filesystem.

You can't use software RAID, or LVM, since they aren't currently cluster
aware.

Thanks, 
Eric Kerin
eric at bootseg.com




From teigland at redhat.com  Wed May 10 16:30:14 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 10 May 2006 11:30:14 -0500
Subject: [Linux-cluster] Slow dfs
In-Reply-To: <446097DF.4080005@framestore-cfc.com>
References: <446097DF.4080005@framestore-cfc.com>
Message-ID: <20060510163014.GB26524@redhat.com>

On Tue, May 09, 2006 at 02:23:43PM +0100, Stephen Willey wrote:
> I saw a question a while back from Jeffrey Bethke about speeding up df
> operations.
> 
> We're considering building a large GFS filesystem and the 11Tb
> filesystem that we have now can take a very long time to return from a
> df (either regular or using gfs_tool).
> 
> Once it's run once it appears to cache the information and runs quite
> quickly, but we're concerned about how this will scale once we get up to
> 100 or so Tb.  Waiting ages for a df to return isn't ideal...

It'll get slower as the fs grows.  In GFS2 df will have no delay at all
regardless of fs size -- tradeoff is that it's "fuzzy", not perfectly
accurate.

Dave



From jlbeti at dsic.upv.es  Wed May 10 16:33:25 2006
From: jlbeti at dsic.upv.es (Jose Luis Beti)
Date: Wed, 10 May 2006 18:33:25 +0200
Subject: [Linux-cluster] Are raw partitions needed in RHCS4?
Message-ID: <1147278805.2595.46.camel@superlopez.dsic.upv.es>

Hi all,

RHCS4 manual talks about crating 2 raw partitions if we are using shared
storage, but after creating them, they are not used any more.

Anyone could tell me if it's necessary to create raw partitions?

Thanks in advanced.
Sorry if the question has been answer before.

Jose Luis.
-- 
------------------------------------------------
Jose Luis Beti
Departament de Sistemes Informatics i Computacio
Universitat Politecnica de Valencia
Telefon: 963877355
Extensio: 73553



From cjk at techma.com  Wed May 10 17:36:59 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 10 May 2006 13:36:59 -0400
Subject: [Linux-cluster] Are raw partitions needed in RHCS4?
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E85@tmaemail.techma.com>

RHCS4? or RHCS3?   RHCS3 uses them, not 4. If it's in the docs for 4 it's a
mistake. 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jose Luis Beti
Sent: Wednesday, May 10, 2006 12:33 PM
To: linux clustering
Subject: [Linux-cluster] Are raw partitions needed in RHCS4?

Hi all,

RHCS4 manual talks about crating 2 raw partitions if we are using shared
storage, but after creating them, they are not used any more.

Anyone could tell me if it's necessary to create raw partitions?

Thanks in advanced.
Sorry if the question has been answer before.

Jose Luis.
--
------------------------------------------------
Jose Luis Beti
Departament de Sistemes Informatics i Computacio Universitat Politecnica de
Valencia
Telefon: 963877355
Extensio: 73553

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jlbeti at dsic.upv.es  Wed May 10 18:45:33 2006
From: jlbeti at dsic.upv.es (Jose Luis Beti)
Date: Wed, 10 May 2006 20:45:33 +0200
Subject: [Linux-cluster] Are raw partitions needed in RHCS4?
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E85@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E85@tmaemail.techma.com>
Message-ID: <1147286733.2595.59.camel@superlopez.dsic.upv.es>

I was talking about RHCS4.
Thanks again.

El mi?, 10-05-2006 a las 13:36 -0400, Kovacs, Corey J. escribi?:
-- 
------------------------------------------------
Jose Luis Beti
Departament de Sistemes Informatics i Computacio
Universitat Politecnica de Valencia
Telefon: 963877355
Extensio: 73553



From gstaltari at arnet.net.ar  Wed May 10 19:12:28 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 10 May 2006 16:12:28 -0300
Subject: [Linux-cluster] lot of scsi devices bug
Message-ID: <44623B1C.2040108@arnet.net.ar>

Hi, this is maybe a udev bug, but it affected me when I was creating a 
lv in a cluster, so it could help some with this configuration.
When I added some scsi disk (SAN) to the cluster nodes (more than 64 
SCSI devices),  udev created the device node for capi20 instead of sdbm. 
This produced  a bad behavior in lvm when I was trying to create the 
vg's and lv's, it started to give errors like:

Error locking on node node-06: Internal lvm error, check syslog
Error locking on node node-05: Internal lvm error, check syslog
Error locking on node node-04: Internal lvm error, check syslog
Error locking on node node-01: Internal lvm error, check syslog
Error locking on node node-02: Internal lvm error, check syslog
Error locking on node node-03: Internal lvm error, check syslog
Failed to activate new LV.

When I commented out this lines

SYSFS{dev}="68:0",              NAME="capi20"
SYSFS{dev}="191:[0-9]*",        NAME="capi/%n"
KERNEL=="capi*",                MODE="0660"

in /etc/udev/rules.d/50-udev.rules, everything worked again.

I hope this could help,
German Staltari



From gstaltari at arnet.net.ar  Wed May 10 20:54:32 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 10 May 2006 17:54:32 -0300
Subject: [Linux-cluster] lot of scsi devices bug
In-Reply-To: <44623B1C.2040108@arnet.net.ar>
References: <44623B1C.2040108@arnet.net.ar>
Message-ID: <44625308.5010402@arnet.net.ar>

German Staltari wrote:
> Hi, this is maybe a udev bug, but it affected me when I was creating a 
> lv in a cluster, so it could help some with this configuration.
> When I added some scsi disk (SAN) to the cluster nodes (more than 64 
> SCSI devices),  udev created the device node for capi20 instead of 
> sdbm. This produced  a bad behavior in lvm when I was trying to create 
> the vg's and lv's, it started to give errors like:
>
> Error locking on node node-06: Internal lvm error, check syslog
> Error locking on node node-05: Internal lvm error, check syslog
> Error locking on node node-04: Internal lvm error, check syslog
> Error locking on node node-01: Internal lvm error, check syslog
> Error locking on node node-02: Internal lvm error, check syslog
> Error locking on node node-03: Internal lvm error, check syslog
> Failed to activate new LV.
>
> When I commented out this lines
>
> SYSFS{dev}="68:0",              NAME="capi20"
> SYSFS{dev}="191:[0-9]*",        NAME="capi/%n"
> KERNEL=="capi*",                MODE="0660"
>
> in /etc/udev/rules.d/50-udev.rules, everything worked again.
>
> I hope this could help,
> German Staltari
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
Forgot to add:
FC4 system, totally updated.



From jason at monsterjam.org  Thu May 11 00:53:21 2006
From: jason at monsterjam.org (Jason)
Date: Wed, 10 May 2006 20:53:21 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E84@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E84@tmaemail.techma.com>
Message-ID: <20060511005321.GA45370@monsterjam.org>

ummm I was thinking that was the answer too, but I have no idea what the "pool" device is..
how can I tell?

Jason


On Wed, May 10, 2006 at 08:33:04AM -0400, Kovacs, Corey J. wrote:
> Jason, I just realized what the problem is. You need to apply the config to a
> "pool"
> not a normal device.  What do your pooll definitions look like? The one you
> created
> for the config is where you need to point ccs_tool at to activate the
> config...
> 
> 
> Corey 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
> Sent: Wednesday, May 10, 2006 8:31 AM
> To: linux clustering
> Subject: RE: [Linux-cluster] question about creating partitions and gfs
> 
> Jason, couple of questions.... (And I assume you are working with
> RHEL3+GFS6.0x)
> 
> 
> 1. Are you actually using raw devices? if so, why? 
> 2. Does the device /dev/raw/raw64 actually exist on tf2?
> 
> 
> GFS does not use raw devices for anything. The standard Redhat Cluster suite
> does, but not GFS. GFS uses "storage pools".  Also, if memory servs me right,
> later versions of GFS for RHEL3 need to be told what pools to use in the
> "/etc/sysconfig/gfs" config file. Used to be that GFS just did a scan and
> "found" the pools, but no longer I believe.
> 
> Hope this helps. If not, can you give more details about your config? 
> 
> 
> 
> Corey
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> Sent: Tuesday, May 09, 2006 8:23 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] question about creating partitions and gfs
> 
> yes, both boxes are connected to the storage, its a dell powervault 220S
> configured for cluster mode. 
> 
> [root at tf1 cluster]#  fdisk -l /dev/sdb
> 
> Disk /dev/sdb: 146.5 GB, 146548981760 bytes
> 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512
> = 8225280 bytes
> 
>    Device Boot    Start       End    Blocks   Id  System
> /dev/sdb1             1      2433  19543041   83  Linux
> [root at tf1 cluster]# 
> 
> [root at tf2 cluster]# fdisk -l /dev/sdb
> 
> Disk /dev/sdb: 146.5 GB, 146548981760 bytes
> 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512
> = 8225280 bytes
> 
>    Device Boot    Start       End    Blocks   Id  System
> /dev/sdb1             1      2433  19543041   83  Linux
> [root at tf2 cluster]# 
> 
> 
> so both sides see the storage.  
> 
> on tf1, I can start ccsd fine, but on tf2, I cant, and I see May  8 22:00:21
> tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or
> address May  8 22:00:21 tf2 ccsd: startup failed May  9 20:17:21 tf2 ccsd:
> Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May  9
> 20:17:21 tf2 ccsd: startup failed May  9 20:17:30 tf2 ccsd: Unable to open
> /dev/sdb1 (/dev/raw/raw64): No such device or address May  9 20:17:30 tf2
> ccsd: startup failed
> [root at tf2 cluster]# 
> 
> in the logs
> 
> Jason
> 
> 
> 
> 
> On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote:
> > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. 
> > Do you have a shared storage device? If /dev/sdb1 is not a shared 
> > device, then I think you might need to take a step back and get a hold 
> > of a SAN of some type. If you are just playing around, there are ways 
> > to get some firewire drives to accept
> > 
> > two hosts and act like a cheap shared devices. There are docs on the 
> > Oracle site documenting the process of setting up the drive and the 
> > kernel. Note, that you'll only be able to use two nodes using the 
> > firewire idea.
> > 
> > Also, you should specify a partition for the command below. That 
> > partition can be very small. Something on the order of 10MB sounds 
> > right. Even that is probably way too big. Then use the rest for GFS 
> > storage pools.
> > 
> > 
> > Corey
> > 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> > Sent: Monday, May 08, 2006 9:32 PM
> > To: linux-cluster at redhat.com
> > Subject: [Linux-cluster] question about creating partitions and gfs
> > 
> > so still following instructions at
> > http://www.gyrate.org/archives/9
> > im at the part that says
> > 
> > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"
> > 
> > in my config, I have the dell PERC 4/DC cards, and I believe the 
> > logical drive showed up as /dev/sdb
> > 
> > so do I need to create a partition on this logical drive with fdisk 
> > first before I run
> > 
> >  ccs_tool create /root/cluster  /dev/sdb1
> > 
> > or am I totally off track here?
> > 
> > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work 
> > fine, but doesnt seem right..
> > 
> > Jason
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> ================================================
> |    Jason Welsh   jason at monsterjam.org        |
> | http://monsterjam.org    DSS PGP: 0x5E30CC98 |
> |    gpg key: http://monsterjam.org/gpg/       |
> ================================================
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================



From mathieu.avila at seanodes.com  Thu May 11 06:44:35 2006
From: mathieu.avila at seanodes.com (Mathieu Avila)
Date: Thu, 11 May 2006 08:44:35 +0200
Subject: [Linux-cluster] GFS 6.0 - ccsd configuration file
Message-ID: <4462DD53.3060602@seanodes.com>

Hello,

In GFS 6.0, has anybody experienced using a CCA archive on a local file 
instead of a shared volume or server ?
More precisely, if using this method correctly by managing the 
consistency of the configuration file over all nodes, is there any 
greater risk of data corruption than with a shared volume archive or a 
server ? I am in the special case where i don't want to manage another 
shared volume, and this option, although less documented, seems better 
to me. Documentation only tells that it is "less recommanded".

Thanks in advance,

--
Mathieu



From carlopmart at gmail.com  Thu May 11 08:04:35 2006
From: carlopmart at gmail.com (carlopmart)
Date: Thu, 11 May 2006 10:04:35 +0200
Subject: [Linux-cluster] Postgresql under RHCS4
Message-ID: <4462F013.40201@gmail.com>

Hi all,

  Somebody have tried to setup two nodes with Postgresql under RHCS4?. 
Is it possible to do this without shared storage like mysql cluster 
feature does?

Thank you very much.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From devrim at gunduz.org  Thu May 11 08:18:58 2006
From: devrim at gunduz.org (Devrim GUNDUZ)
Date: Thu, 11 May 2006 11:18:58 +0300 (EEST)
Subject: [Linux-cluster] Postgresql under RHCS4
In-Reply-To: <4462F013.40201@gmail.com>
References: <4462F013.40201@gmail.com>
Message-ID: <Pine.LNX.4.63.0605111111550.22324@mail.kivi.com.tr>


Hi,

On Thu, 11 May 2006, carlopmart wrote:

> Somebody have tried to setup two nodes with Postgresql under RHCS4?. Is it 
> possible to do this without shared storage like mysql cluster feature does?

If you want to run an active/passive cluster, then go on with 
RHCS+ext{2,3}(or GFS). PostgreSQL cannot run on active/active cluster 
systems, natively. However, you might give PgCluster a try. Even though it 
is not the best way, it is worth trying. Slony-II will be implementing 
multimaster replication feature, but it is still under development.

BTW, GFS is not a prerequisite, if you run an active/passive cluster; 
however I used it, in order to prevent a spof.

Regards,
--
Devrim GUNDUZ
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
                       http://www.gunduz.org



From nattaponv at hotmail.com  Thu May 11 08:21:43 2006
From: nattaponv at hotmail.com (nattapon viroonsri)
Date: Thu, 11 May 2006 08:21:43 +0000
Subject: [Linux-cluster] fence_manual problem
Message-ID: <BAY22-F12E0AA524FCB75CB22ACEAA6AF0@phx.gbl>

I use rhcs 4  on rhel 4.0
I have setup 2 node cluster use manual fenceing
node name = cluster1 , cluster2
It failover completely if i stop service for each nod
But when i try to disconnected cable from cluster1 both node try to fence 
each other and
have following log:

May 11 15:50:26 cluster2 fenced[2183]: fencing node "cluster1"
May 11 15:50:26 cluster2 fenced[2183]: agent "fence_manual" reports: failed: 
fence_manual no node name
May 11 15:50:26 cluster2 fenced[2183]: fence "cluster1" failed

I try to run  "fence_ack_manual -n node1" but it's out put show that  have 
no file "/tmp/fence_manual.fifo"
so i create "/tmp/fence_manual.fifo" manually  and re run fence_ack_manual 
it show "done"

but in the logfile still the same as if no thing happen and  service still 
not failover

Nattapon,
Regard

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



From nattaponv at hotmail.com  Thu May 11 08:46:53 2006
From: nattaponv at hotmail.com (nattapon viroonsri)
Date: Thu, 11 May 2006 08:46:53 +0000
Subject: [Linux-cluster] manual_fence problem
Message-ID: <BAY22-F145651A59A31B65F3810A9A6AF0@phx.gbl>

I use rhcs 4  on rhel 4.0
I have setup 2 node cluster use manual fenceing
node name = cluster1 , cluster2
It failover completely if i stop service for each nod
But when i try to disconnected cable from cluster1 both node try to fence 
each other and
have following log:

May 11 15:50:26 cluster2 fenced[2183]: fencing node "cluster1"
May 11 15:50:26 cluster2 fenced[2183]: agent "fence_manual" reports: failed: 
fence_manual no node name
May 11 15:50:26 cluster2 fenced[2183]: fence "cluster1" failed

>From system-config-cluster menu it have no parameter to specify  node name 
for manual fencing but in command line can.
so I try to run  "fence_ack_manual -n node1" but it's out put show that  
have no file "/tmp/fence_manual.fifo"
after i create "/tmp/fence_manual.fifo" manually  and re run 
fence_ack_manual it show "done"

but in the logfile still the same as if no thing happen and  service still 
not failover

Nattapon,
Regard

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



From cjk at techma.com  Thu May 11 11:16:14 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 11 May 2006 07:16:14 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
In-Reply-To: <20060511005321.GA45370@monsterjam.org>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E86@tmaemail.techma.com>

Jason, the docs should run through the creation of the pool devices. They can
be
a bit of a labrynth though, so here is an example called "pool_cca.cfg".


<----cut here---->
poolname pool_cca			#name of the pool/volume to create
subpools 1				#how many subpools make up this
pool/volume (always starts as 1)
subpool 0 128 1 gfs_data	#first subpool, zero indexed, 128k stripe, 1
devices
pooldevice 0 0 /dev/sdb1	#physical device for pool 0, device 0 (again,
zero indexed)
<-end cut here -->

Additional pools just need a different "poolname" and "pooldevice". 

NOTE, the cluster nodes need to be "seeing" the devices listed as pooldevices
the same
way. node1 sees the second physical disk as /dev/sdb, then third as /dev/sdc
and so on.


Now, if you make /dev/sdb1 about 10MB, you'll have enough space to create a
cluster
config pool. Then to actually use it, you need to do the following...

pool_tool -c pool_cca.cfg

then you can issue ...

service pool start

on all nodes. Just make sure all nodes have a clean view of the partition 
table (reboot, or issue partprobe).

Once you have the cca pool created and activated, you can apply the cluster
config
to it...

ccs_tool create /path/to/configs/   /dev/pool/pool_cca

Then do a "service ccsd start" on all nodes followed by "service lock_gulmd
start"
on all nodes..

To check to see if things are working...do...

gulm_tool nodelist nameofalockserver

and you should see a list of your nodes and some info about each one.

That's should be enough to get you started. to add storage for actual gfs
filesystems, simply
create more pools. you can also expand pools by adding subpools after
creation. It's sort of
a poor mans volume management if you will. It can be done to a running system
and the filesystem
on top of it can be expaned live as well.


Anyway, hope this helps...


Corey


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
Sent: Wednesday, May 10, 2006 8:53 PM
To: linux clustering
Subject: Re: [Linux-cluster] question about creating partitions and gfs

ummm I was thinking that was the answer too, but I have no idea what the
"pool" device is..
how can I tell?

Jason


On Wed, May 10, 2006 at 08:33:04AM -0400, Kovacs, Corey J. wrote:
> Jason, I just realized what the problem is. You need to apply the 
> config to a "pool"
> not a normal device.  What do your pooll definitions look like? The 
> one you created for the config is where you need to point ccs_tool at 
> to activate the config...
> 
> 
> Corey
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
> Sent: Wednesday, May 10, 2006 8:31 AM
> To: linux clustering
> Subject: RE: [Linux-cluster] question about creating partitions and 
> gfs
> 
> Jason, couple of questions.... (And I assume you are working with
> RHEL3+GFS6.0x)
> 
> 
> 1. Are you actually using raw devices? if so, why? 
> 2. Does the device /dev/raw/raw64 actually exist on tf2?
> 
> 
> GFS does not use raw devices for anything. The standard Redhat Cluster 
> suite does, but not GFS. GFS uses "storage pools".  Also, if memory 
> servs me right, later versions of GFS for RHEL3 need to be told what 
> pools to use in the "/etc/sysconfig/gfs" config file. Used to be that 
> GFS just did a scan and "found" the pools, but no longer I believe.
> 
> Hope this helps. If not, can you give more details about your config? 
> 
> 
> 
> Corey
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> Sent: Tuesday, May 09, 2006 8:23 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] question about creating partitions and 
> gfs
> 
> yes, both boxes are connected to the storage, its a dell powervault 
> 220S configured for cluster mode.
> 
> [root at tf1 cluster]#  fdisk -l /dev/sdb
> 
> Disk /dev/sdb: 146.5 GB, 146548981760 bytes
> 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes
> 
>    Device Boot    Start       End    Blocks   Id  System
> /dev/sdb1             1      2433  19543041   83  Linux
> [root at tf1 cluster]#
> 
> [root at tf2 cluster]# fdisk -l /dev/sdb
> 
> Disk /dev/sdb: 146.5 GB, 146548981760 bytes
> 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes
> 
>    Device Boot    Start       End    Blocks   Id  System
> /dev/sdb1             1      2433  19543041   83  Linux
> [root at tf2 cluster]#
> 
> 
> so both sides see the storage.  
> 
> on tf1, I can start ccsd fine, but on tf2, I cant, and I see May  8 
> 22:00:21
> tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or 
> address May  8 22:00:21 tf2 ccsd: startup failed May  9 20:17:21 tf2 ccsd:
> Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address 
> May  9
> 20:17:21 tf2 ccsd: startup failed May  9 20:17:30 tf2 ccsd: Unable to 
> open
> /dev/sdb1 (/dev/raw/raw64): No such device or address May  9 20:17:30 
> tf2
> ccsd: startup failed
> [root at tf2 cluster]#
> 
> in the logs
> 
> Jason
> 
> 
> 
> 
> On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote:
> > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. 
> > Do you have a shared storage device? If /dev/sdb1 is not a shared 
> > device, then I think you might need to take a step back and get a 
> > hold of a SAN of some type. If you are just playing around, there 
> > are ways to get some firewire drives to accept
> > 
> > two hosts and act like a cheap shared devices. There are docs on the 
> > Oracle site documenting the process of setting up the drive and the 
> > kernel. Note, that you'll only be able to use two nodes using the 
> > firewire idea.
> > 
> > Also, you should specify a partition for the command below. That 
> > partition can be very small. Something on the order of 10MB sounds 
> > right. Even that is probably way too big. Then use the rest for GFS 
> > storage pools.
> > 
> > 
> > Corey
> > 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> > Sent: Monday, May 08, 2006 9:32 PM
> > To: linux-cluster at redhat.com
> > Subject: [Linux-cluster] question about creating partitions and gfs
> > 
> > so still following instructions at
> > http://www.gyrate.org/archives/9
> > im at the part that says
> > 
> > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"
> > 
> > in my config, I have the dell PERC 4/DC cards, and I believe the 
> > logical drive showed up as /dev/sdb
> > 
> > so do I need to create a partition on this logical drive with fdisk 
> > first before I run
> > 
> >  ccs_tool create /root/cluster  /dev/sdb1
> > 
> > or am I totally off track here?
> > 
> > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work 
> > fine, but doesnt seem right..
> > 
> > Jason
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> ================================================
> |    Jason Welsh   jason at monsterjam.org        |
> | http://monsterjam.org    DSS PGP: 0x5E30CC98 |
> |    gpg key: http://monsterjam.org/gpg/       |
> ================================================
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From cjk at techma.com  Thu May 11 11:20:12 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 11 May 2006 07:20:12 -0400
Subject: [Linux-cluster] GFS 6.0 - ccsd configuration file
In-Reply-To: <4462DD53.3060602@seanodes.com>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E87@tmaemail.techma.com>

I've used it (in testing) and I think the main reasons it's "less
recommended"
is that it's easier to keep a single copy consistant on shared storage as
it's
automatic. However the method you mention must be kept consistant manually. I
don't think there is any greater risk for data loss other than that. Others 
may know more tho...


Corey 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Mathieu Avila
Sent: Thursday, May 11, 2006 2:45 AM
To: linux clustering
Subject: [Linux-cluster] GFS 6.0 - ccsd configuration file

Hello,

In GFS 6.0, has anybody experienced using a CCA archive on a local file
instead of a shared volume or server ?
More precisely, if using this method correctly by managing the consistency of
the configuration file over all nodes, is there any greater risk of data
corruption than with a shared volume archive or a server ? I am in the
special case where i don't want to manage another shared volume, and this
option, although less documented, seems better to me. Documentation only
tells that it is "less recommanded".

Thanks in advance,

--
Mathieu

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From stephen.willey at framestore-cfc.com  Thu May 11 14:29:24 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Thu, 11 May 2006 15:29:24 +0100
Subject: [Linux-cluster] gfs_fsck problems (not doing get_get_meta_buffer)
Message-ID: <44634A44.9060309@framestore-cfc.com>

gfs_fsck seems to break my filesystem!

Here's the sequence of events (everything acts as expected unless I
state otherwise):

pvcreate /dev/sda; pvcreate /dev/sdb
vgcreate gfs_vg /dev/sda /dev/sdb
vgdisplay
lvcreate -l 4171379 gfs_vg -n gfs_lv (the extents number obviously
gleaned from vgdisplay)
vgchange -aly
gfs_mkfs -p lock_dlm -t mycluster:gfs1 -j 8 /dev/gfs_vg/gfs_lv

mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2
df -h /mnt/disk2
cd /mnt/disk2
touch 1 2 3 4 5 6 7 8 9 10
ls -lh

cd ..
umount /mnt/disk2
gfs_fsck -nvv /dev/gfs_vg/gfs_lv (output below - notice I'm running it
read-only)

Initializing fsck
Initializing lists...
Initializing special inodes...
(file.c:45)     readi:  Offset (640) is >= the file size (640).
(super.c:208)   8 journals found.
ATTENTION -- not doing gfs_get_meta_buffer...

mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2
cd /mnt/disk2 (successful)
ls -lh (successful)

cd ..
umount /mnt/disk2
gfs_fsck -vv /dev/gfs_vg/gfs_lv (output below)

Initializing fsck
Initializing lists...
(bio.c:140)     Writing to 65536 - 16 4096
Initializing special inodes...
(file.c:45)     readi:  Offset (640) is >= the file size (640).
(super.c:208)   8 journals found.
ATTENTION -- not doing gfs_get_meta_buffer...

mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 (output below)

mount: No such file or directory

The syslog shows:

Lock_Harness 2.6.9-34.R5.2 (built May 11 2006 14:15:58) installed
May 11 15:12:43 gfstest1 kernel: GFS 2.6.9-34.R5.2 (built May 11 2006
14:16:10) installed
May 11 15:12:43 gfstest1 kernel: GFS: Trying to join cluster "fsck_dlm",
"mycluster:gfs1"
May 11 15:12:43 gfstest1 kernel: lock_harness:  can't find protocol fsck_dlm
May 11 15:12:43 gfstest1 kernel: GFS: can't mount proto = fsck_dlm,
table = mycluster:gfs1, hostdata =
May 11 15:12:43 gfstest1 mount: mount: No such file or directory
May 11 15:12:43 gfstest1 gfs: Mounting GFS filesystems:  failed

If I use the following to change the lock method, I can mount it again:

gfs_tool sb /dev/gfs_vg/gfs_lv proto lock_dlm

but shortly after I'll sometimes get I/O errors on the drive not letting
me cd into it or ls or df.

fsck isn't supposed to break clean filesystems so does anyone have any
ideas?

FYI - The other machines in the cluster were at no point mounting the
filesystem during this exercise.

Stephen



From lhh at redhat.com  Thu May 11 14:29:39 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 11 May 2006 10:29:39 -0400
Subject: [Linux-cluster] GFS 6.0 - ccsd configuration file
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E87@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E87@tmaemail.techma.com>
Message-ID: <1147357779.11396.115.camel@ayanami.boston.redhat.com>

On Thu, 2006-05-11 at 07:20 -0400, Kovacs, Corey J. wrote:
> I've used it (in testing) and I think the main reasons it's "less
> recommended"
> is that it's easier to keep a single copy consistant on shared storage as
> it's
> automatic. However the method you mention must be kept consistant manually. I
> don't think there is any greater risk for data loss other than that. Others 
> may know more tho...

That's correct.

-- Lon




From lhh at redhat.com  Thu May 11 14:31:53 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 11 May 2006 10:31:53 -0400
Subject: [Linux-cluster] fence_manual problem
In-Reply-To: <BAY22-F12E0AA524FCB75CB22ACEAA6AF0@phx.gbl>
References: <BAY22-F12E0AA524FCB75CB22ACEAA6AF0@phx.gbl>
Message-ID: <1147357913.11396.118.camel@ayanami.boston.redhat.com>

On Thu, 2006-05-11 at 08:21 +0000, nattapon viroonsri wrote:
> I use rhcs 4  on rhel 4.0
> I have setup 2 node cluster use manual fenceing
> node name = cluster1 , cluster2
> It failover completely if i stop service for each nod
> But when i try to disconnected cable from cluster1 both node try to fence 
> each other and
> have following log:
> 
> May 11 15:50:26 cluster2 fenced[2183]: fencing node "cluster1"
> May 11 15:50:26 cluster2 fenced[2183]: agent "fence_manual" reports: failed: 
> fence_manual no node name
> May 11 15:50:26 cluster2 fenced[2183]: fence "cluster1" failed
> 
> I try to run  "fence_ack_manual -n node1" but it's out put show that  have 
> no file "/tmp/fence_manual.fifo"
> so i create "/tmp/fence_manual.fifo" manually  and re run fence_ack_manual 
> it show "done"
> 
> but in the logfile still the same as if no thing happen and  service still 
> not failover

I think the UI is supposed to provide a nodename="name_of_node" in the
fence device reference under the given node, but doesn't.

I also thought it was fixed recently *scratches head*...

-- Lon



From roman.tobjasz at 7bulls.com  Thu May 11 15:00:34 2006
From: roman.tobjasz at 7bulls.com (Roman Tobjasz)
Date: Thu, 11 May 2006 17:00:34 +0200
Subject: [Linux-cluster] IP resource
Message-ID: <20060511150034.GA32599@warszawa.7bulls.com>

I configured two node cluster.
On each node I created bonding device (bond0) as primary network
interface.
On the 1st node bond0 I assigned IP address 192.168.1.100 (network
192.168.1.0, netmask 255.255.255.0).
On the 2nd node bond0 I assigned IP address 192.168.1.101 (network and
netmask like above).
Next I created IP address 172.16.10.10 as a resource and added it to a
service.
Service doesn't start. If I change resource IP to 192.168.1.200
then service starts corectly.

Is it possible to set up resource IP which isn't from this same
network as primary network interface ?

Best regards.



From cjk at techma.com  Thu May 11 16:09:25 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 11 May 2006 12:09:25 -0400
Subject: [Linux-cluster] cman/dlm errors
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E89@tmaemail.techma.com>

There seem to be lot bugs filed against cman and dlm in bugzilla post update
3 and I believe
I am seeing some of the same problems. In particular, after some time
running, if I manually
kick one of the nodes in any way (fence, pull the cord, whatever) it ends up
taking one of the
remaining nodes with it due to a problem in the membership routines of cman.
This is also a
major pain since this will happen due to what appear to be dlm issues and a
node will fall on
it's face anwyay, then bring down another member. We are exporting 6 gfs
filesystems via
nfs which is where the dlm problem seems to stem from.

So, I have two questions for the redhat cluster folks....

1. When is the errata scheduled to come out that covers the latest round of
bugs?

2. When the next version of GFS is released, will the new architecture
replace the current one 
    for RHEL4 or will it be a RHEL5 only version? I believe the former is
true, but I'd like to here
    it from the redhat folks.

>From what I've read lately from all the "features" documents flying around
(Thanks for those by the way)
things look much better on the horizon then they are right now.



Regards



Corey


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060511/6892df81/attachment.htm>

From jason at monsterjam.org  Fri May 12 01:51:49 2006
From: jason at monsterjam.org (Jason)
Date: Thu, 11 May 2006 21:51:49 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E86@tmaemail.techma.com>
References: <20060511005321.GA45370@monsterjam.org>
	<FF2CE0D593AEE34B955FEC77BD5AFBE0079E86@tmaemail.techma.com>
Message-ID: <20060512015149.GB64851@monsterjam.org>

ok, so reading the docs and your example, they reference /dev/sdb1
this is still the 10 meg partition that i create with fdisk.. right?
then what about the rest of the disk? do I need to reference it as a pooldevice as well?
i.e. 
/dev/sdb1   <-10 meg partition
/dev/sdb2   <--- rest of logical disk ??

Jason

On Thu, May 11, 2006 at 07:16:14AM -0400, Kovacs, Corey J. wrote:
> Jason, the docs should run through the creation of the pool devices. They can
> be
> a bit of a labrynth though, so here is an example called "pool_cca.cfg".
> 
> 
> <----cut here---->
> poolname pool_cca			#name of the pool/volume to create
> subpools 1				#how many subpools make up this
> pool/volume (always starts as 1)
> subpool 0 128 1 gfs_data	#first subpool, zero indexed, 128k stripe, 1
> devices
> pooldevice 0 0 /dev/sdb1	#physical device for pool 0, device 0 (again,
> zero indexed)
> <-end cut here -->
> 
> Additional pools just need a different "poolname" and "pooldevice". 
> 
> NOTE, the cluster nodes need to be "seeing" the devices listed as pooldevices
> the same
> way. node1 sees the second physical disk as /dev/sdb, then third as /dev/sdc
> and so on.
> 
> 
> Now, if you make /dev/sdb1 about 10MB, you'll have enough space to create a
> cluster
> config pool. Then to actually use it, you need to do the following...
> 
> pool_tool -c pool_cca.cfg
> 
> then you can issue ...
> 
> service pool start
> 
> on all nodes. Just make sure all nodes have a clean view of the partition 
> table (reboot, or issue partprobe).
> 
> Once you have the cca pool created and activated, you can apply the cluster
> config
> to it...
> 
> ccs_tool create /path/to/configs/   /dev/pool/pool_cca
> 
> Then do a "service ccsd start" on all nodes followed by "service lock_gulmd
> start"
> on all nodes..
> 
> To check to see if things are working...do...
> 
> gulm_tool nodelist nameofalockserver
> 
> and you should see a list of your nodes and some info about each one.
> 
> That's should be enough to get you started. to add storage for actual gfs
> filesystems, simply
> create more pools. you can also expand pools by adding subpools after
> creation. It's sort of
> a poor mans volume management if you will. It can be done to a running system
> and the filesystem
> on top of it can be expaned live as well.
> 
> 
> Anyway, hope this helps...
> 
> 
> Corey
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> Sent: Wednesday, May 10, 2006 8:53 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] question about creating partitions and gfs
> 
> ummm I was thinking that was the answer too, but I have no idea what the
> "pool" device is..
> how can I tell?
> 
> Jason
> 
> 
> On Wed, May 10, 2006 at 08:33:04AM -0400, Kovacs, Corey J. wrote:
> > Jason, I just realized what the problem is. You need to apply the 
> > config to a "pool"
> > not a normal device.  What do your pooll definitions look like? The 
> > one you created for the config is where you need to point ccs_tool at 
> > to activate the config...
> > 
> > 
> > Corey
> > 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
> > Sent: Wednesday, May 10, 2006 8:31 AM
> > To: linux clustering
> > Subject: RE: [Linux-cluster] question about creating partitions and 
> > gfs
> > 
> > Jason, couple of questions.... (And I assume you are working with
> > RHEL3+GFS6.0x)
> > 
> > 
> > 1. Are you actually using raw devices? if so, why? 
> > 2. Does the device /dev/raw/raw64 actually exist on tf2?
> > 
> > 
> > GFS does not use raw devices for anything. The standard Redhat Cluster 
> > suite does, but not GFS. GFS uses "storage pools".  Also, if memory 
> > servs me right, later versions of GFS for RHEL3 need to be told what 
> > pools to use in the "/etc/sysconfig/gfs" config file. Used to be that 
> > GFS just did a scan and "found" the pools, but no longer I believe.
> > 
> > Hope this helps. If not, can you give more details about your config? 
> > 
> > 
> > 
> > Corey
> > 
> > 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> > Sent: Tuesday, May 09, 2006 8:23 PM
> > To: linux clustering
> > Subject: Re: [Linux-cluster] question about creating partitions and 
> > gfs
> > 
> > yes, both boxes are connected to the storage, its a dell powervault 
> > 220S configured for cluster mode.
> > 
> > [root at tf1 cluster]#  fdisk -l /dev/sdb
> > 
> > Disk /dev/sdb: 146.5 GB, 146548981760 bytes
> > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 
> > 16065 * 512 = 8225280 bytes
> > 
> >    Device Boot    Start       End    Blocks   Id  System
> > /dev/sdb1             1      2433  19543041   83  Linux
> > [root at tf1 cluster]#
> > 
> > [root at tf2 cluster]# fdisk -l /dev/sdb
> > 
> > Disk /dev/sdb: 146.5 GB, 146548981760 bytes
> > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 
> > 16065 * 512 = 8225280 bytes
> > 
> >    Device Boot    Start       End    Blocks   Id  System
> > /dev/sdb1             1      2433  19543041   83  Linux
> > [root at tf2 cluster]#
> > 
> > 
> > so both sides see the storage.  
> > 
> > on tf1, I can start ccsd fine, but on tf2, I cant, and I see May  8 
> > 22:00:21
> > tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or 
> > address May  8 22:00:21 tf2 ccsd: startup failed May  9 20:17:21 tf2 ccsd:
> > Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address 
> > May  9
> > 20:17:21 tf2 ccsd: startup failed May  9 20:17:30 tf2 ccsd: Unable to 
> > open
> > /dev/sdb1 (/dev/raw/raw64): No such device or address May  9 20:17:30 
> > tf2
> > ccsd: startup failed
> > [root at tf2 cluster]#
> > 
> > in the logs
> > 
> > Jason
> > 
> > 
> > 
> > 
> > On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote:
> > > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. 
> > > Do you have a shared storage device? If /dev/sdb1 is not a shared 
> > > device, then I think you might need to take a step back and get a 
> > > hold of a SAN of some type. If you are just playing around, there 
> > > are ways to get some firewire drives to accept
> > > 
> > > two hosts and act like a cheap shared devices. There are docs on the 
> > > Oracle site documenting the process of setting up the drive and the 
> > > kernel. Note, that you'll only be able to use two nodes using the 
> > > firewire idea.
> > > 
> > > Also, you should specify a partition for the command below. That 
> > > partition can be very small. Something on the order of 10MB sounds 
> > > right. Even that is probably way too big. Then use the rest for GFS 
> > > storage pools.
> > > 
> > > 
> > > Corey
> > > 
> > > -----Original Message-----
> > > From: linux-cluster-bounces at redhat.com 
> > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> > > Sent: Monday, May 08, 2006 9:32 PM
> > > To: linux-cluster at redhat.com
> > > Subject: [Linux-cluster] question about creating partitions and gfs
> > > 
> > > so still following instructions at
> > > http://www.gyrate.org/archives/9
> > > im at the part that says
> > > 
> > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"
> > > 
> > > in my config, I have the dell PERC 4/DC cards, and I believe the 
> > > logical drive showed up as /dev/sdb
> > > 
> > > so do I need to create a partition on this logical drive with fdisk 
> > > first before I run
> > > 
> > >  ccs_tool create /root/cluster  /dev/sdb1
> > > 
> > > or am I totally off track here?
> > > 
> > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work 
> > > fine, but doesnt seem right..
> > > 
> > > Jason
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > ================================================
> > |    Jason Welsh   jason at monsterjam.org        |
> > | http://monsterjam.org    DSS PGP: 0x5E30CC98 |
> > |    gpg key: http://monsterjam.org/gpg/       |
> > ================================================
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> ================================================
> |    Jason Welsh   jason at monsterjam.org        |
> | http://monsterjam.org    DSS PGP: 0x5E30CC98 |
> |    gpg key: http://monsterjam.org/gpg/       |
> ================================================
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================



From nitin.prakash at qsoftindia.com  Fri May 12 02:43:01 2006
From: nitin.prakash at qsoftindia.com (Nitin)
Date: Fri, 12 May 2006 08:13:01 +0530
Subject: [Linux-cluster] Re: Redhat Cluster
Message-ID: <1147401781.4033.4.camel@localhost.localdomain>

Dear All,

   
      I installed redhat cluster suite on 2 node cluster, i configured
NFS service by using NFS druid. But we i am going to start the service
buy clicking start cluster locally only one node it is showing started
but when i go to other node and start the cluster both nodes are
restarted. When i start cluster buy issuing command clumanager in both
the node again the nodes are restarting.

 Please tell the solution for this problem.

Regards
Nitin. P  



From cjk at techma.com  Fri May 12 11:21:44 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Fri, 12 May 2006 07:21:44 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
In-Reply-To: <20060512015149.GB64851@monsterjam.org>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E8B@tmaemail.techma.com>

Yes, you need to create pool devices for all things gfs, the first of which 
is the cluster configuration archive. You'll need to make more for actual GFS
filesystems you want to create. You can think of pools as cluster aware
volumes.
Just as in LVM, pools relate to volumes which relate to "presented devices".

Make sense?  Good luck!


Corey

 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
Sent: Thursday, May 11, 2006 9:52 PM
To: linux clustering
Subject: Re: [Linux-cluster] question about creating partitions and gfs

ok, so reading the docs and your example, they reference /dev/sdb1 this is
still the 10 meg partition that i create with fdisk.. right?
then what about the rest of the disk? do I need to reference it as a
pooldevice as well?
i.e. 
/dev/sdb1   <-10 meg partition
/dev/sdb2   <--- rest of logical disk ??

Jason

On Thu, May 11, 2006 at 07:16:14AM -0400, Kovacs, Corey J. wrote:
> Jason, the docs should run through the creation of the pool devices. 
> They can be a bit of a labrynth though, so here is an example called 
> "pool_cca.cfg".
> 
> 
> <----cut here---->
> poolname pool_cca			#name of the pool/volume to create
> subpools 1				#how many subpools make up this
> pool/volume (always starts as 1)
> subpool 0 128 1 gfs_data	#first subpool, zero indexed, 128k stripe, 1
> devices
> pooldevice 0 0 /dev/sdb1	#physical device for pool 0, device 0 (again,
> zero indexed)
> <-end cut here -->
> 
> Additional pools just need a different "poolname" and "pooldevice". 
> 
> NOTE, the cluster nodes need to be "seeing" the devices listed as 
> pooldevices the same way. node1 sees the second physical disk as 
> /dev/sdb, then third as /dev/sdc and so on.
> 
> 
> Now, if you make /dev/sdb1 about 10MB, you'll have enough space to 
> create a cluster config pool. Then to actually use it, you need to do 
> the following...
> 
> pool_tool -c pool_cca.cfg
> 
> then you can issue ...
> 
> service pool start
> 
> on all nodes. Just make sure all nodes have a clean view of the 
> partition table (reboot, or issue partprobe).
> 
> Once you have the cca pool created and activated, you can apply the 
> cluster config to it...
> 
> ccs_tool create /path/to/configs/   /dev/pool/pool_cca
> 
> Then do a "service ccsd start" on all nodes followed by "service 
> lock_gulmd start"
> on all nodes..
> 
> To check to see if things are working...do...
> 
> gulm_tool nodelist nameofalockserver
> 
> and you should see a list of your nodes and some info about each one.
> 
> That's should be enough to get you started. to add storage for actual 
> gfs filesystems, simply create more pools. you can also expand pools 
> by adding subpools after creation. It's sort of a poor mans volume 
> management if you will. It can be done to a running system and the 
> filesystem on top of it can be expaned live as well.
> 
> 
> Anyway, hope this helps...
> 
> 
> Corey
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> Sent: Wednesday, May 10, 2006 8:53 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] question about creating partitions and 
> gfs
> 
> ummm I was thinking that was the answer too, but I have no idea what 
> the "pool" device is..
> how can I tell?
> 
> Jason
> 
> 
> On Wed, May 10, 2006 at 08:33:04AM -0400, Kovacs, Corey J. wrote:
> > Jason, I just realized what the problem is. You need to apply the 
> > config to a "pool"
> > not a normal device.  What do your pooll definitions look like? The 
> > one you created for the config is where you need to point ccs_tool 
> > at to activate the config...
> > 
> > 
> > Corey
> > 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
> > Sent: Wednesday, May 10, 2006 8:31 AM
> > To: linux clustering
> > Subject: RE: [Linux-cluster] question about creating partitions and 
> > gfs
> > 
> > Jason, couple of questions.... (And I assume you are working with
> > RHEL3+GFS6.0x)
> > 
> > 
> > 1. Are you actually using raw devices? if so, why? 
> > 2. Does the device /dev/raw/raw64 actually exist on tf2?
> > 
> > 
> > GFS does not use raw devices for anything. The standard Redhat 
> > Cluster suite does, but not GFS. GFS uses "storage pools".  Also, if 
> > memory servs me right, later versions of GFS for RHEL3 need to be 
> > told what pools to use in the "/etc/sysconfig/gfs" config file. Used 
> > to be that GFS just did a scan and "found" the pools, but no longer I
believe.
> > 
> > Hope this helps. If not, can you give more details about your config? 
> > 
> > 
> > 
> > Corey
> > 
> > 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> > Sent: Tuesday, May 09, 2006 8:23 PM
> > To: linux clustering
> > Subject: Re: [Linux-cluster] question about creating partitions and 
> > gfs
> > 
> > yes, both boxes are connected to the storage, its a dell powervault 
> > 220S configured for cluster mode.
> > 
> > [root at tf1 cluster]#  fdisk -l /dev/sdb
> > 
> > Disk /dev/sdb: 146.5 GB, 146548981760 bytes
> > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of
> > 16065 * 512 = 8225280 bytes
> > 
> >    Device Boot    Start       End    Blocks   Id  System
> > /dev/sdb1             1      2433  19543041   83  Linux
> > [root at tf1 cluster]#
> > 
> > [root at tf2 cluster]# fdisk -l /dev/sdb
> > 
> > Disk /dev/sdb: 146.5 GB, 146548981760 bytes
> > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of
> > 16065 * 512 = 8225280 bytes
> > 
> >    Device Boot    Start       End    Blocks   Id  System
> > /dev/sdb1             1      2433  19543041   83  Linux
> > [root at tf2 cluster]#
> > 
> > 
> > so both sides see the storage.  
> > 
> > on tf1, I can start ccsd fine, but on tf2, I cant, and I see May  8
> > 22:00:21
> > tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device 
> > or address May  8 22:00:21 tf2 ccsd: startup failed May  9 20:17:21 tf2
ccsd:
> > Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address 
> > May  9
> > 20:17:21 tf2 ccsd: startup failed May  9 20:17:30 tf2 ccsd: Unable 
> > to open
> > /dev/sdb1 (/dev/raw/raw64): No such device or address May  9 
> > 20:17:30
> > tf2
> > ccsd: startup failed
> > [root at tf2 cluster]#
> > 
> > in the logs
> > 
> > Jason
> > 
> > 
> > 
> > 
> > On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote:
> > > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. 
> > > Do you have a shared storage device? If /dev/sdb1 is not a shared 
> > > device, then I think you might need to take a step back and get a 
> > > hold of a SAN of some type. If you are just playing around, there 
> > > are ways to get some firewire drives to accept
> > > 
> > > two hosts and act like a cheap shared devices. There are docs on 
> > > the Oracle site documenting the process of setting up the drive 
> > > and the kernel. Note, that you'll only be able to use two nodes 
> > > using the firewire idea.
> > > 
> > > Also, you should specify a partition for the command below. That 
> > > partition can be very small. Something on the order of 10MB sounds 
> > > right. Even that is probably way too big. Then use the rest for 
> > > GFS storage pools.
> > > 
> > > 
> > > Corey
> > > 
> > > -----Original Message-----
> > > From: linux-cluster-bounces at redhat.com 
> > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> > > Sent: Monday, May 08, 2006 9:32 PM
> > > To: linux-cluster at redhat.com
> > > Subject: [Linux-cluster] question about creating partitions and 
> > > gfs
> > > 
> > > so still following instructions at
> > > http://www.gyrate.org/archives/9
> > > im at the part that says
> > > 
> > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1"
> > > 
> > > in my config, I have the dell PERC 4/DC cards, and I believe the 
> > > logical drive showed up as /dev/sdb
> > > 
> > > so do I need to create a partition on this logical drive with 
> > > fdisk first before I run
> > > 
> > >  ccs_tool create /root/cluster  /dev/sdb1
> > > 
> > > or am I totally off track here?
> > > 
> > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work 
> > > fine, but doesnt seem right..
> > > 
> > > Jason
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > ================================================
> > |    Jason Welsh   jason at monsterjam.org        |
> > | http://monsterjam.org    DSS PGP: 0x5E30CC98 |
> > |    gpg key: http://monsterjam.org/gpg/       |
> > ================================================
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> ================================================
> |    Jason Welsh   jason at monsterjam.org        |
> | http://monsterjam.org    DSS PGP: 0x5E30CC98 |
> |    gpg key: http://monsterjam.org/gpg/       |
> ================================================
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From cosimo at streppone.it  Fri May 12 12:54:10 2006
From: cosimo at streppone.it (Cosimo Streppone)
Date: Fri, 12 May 2006 14:54:10 +0200
Subject: [Linux-cluster] RHCS4 heartbeat configuration
Message-ID: <44648572.6030105@streppone.it>

I read the CS4 manual, but I can't seem to find
a way to configure the heartbeat behaviour.

I have a 2-nodes cluster and I'd like to set up
main network interface on eth0 and heartbeat
interface on eth1 (or serial port? or both?).

Do I need to run the piranha_gui?
I'm using a different httpd, not that shipped with RHEL,
so I'm having a hard time running piranha_gui...

Is it possible to manually configure it?
In this case, what is the configuration file for heartbeat?
I found an empty lvs.cf but I don't know what it is.

Maybe I'm asking too many questions...
If I missed the obvious, please point me to the right
manual section where this is explained.

Thank you.

-- 
Cosimo



From lhh at redhat.com  Fri May 12 14:04:20 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 12 May 2006 10:04:20 -0400
Subject: [Linux-cluster] IP resource
In-Reply-To: <20060511150034.GA32599@warszawa.7bulls.com>
References: <20060511150034.GA32599@warszawa.7bulls.com>
Message-ID: <1147442660.11396.134.camel@ayanami.boston.redhat.com>

On Thu, 2006-05-11 at 17:00 +0200, Roman Tobjasz wrote:
> I configured two node cluster.
> On each node I created bonding device (bond0) as primary network
> interface.
> On the 1st node bond0 I assigned IP address 192.168.1.100 (network
> 192.168.1.0, netmask 255.255.255.0).
> On the 2nd node bond0 I assigned IP address 192.168.1.101 (network and
> netmask like above).
> Next I created IP address 172.16.10.10 as a resource and added it to a
> service.
> Service doesn't start. If I change resource IP to 192.168.1.200
> then service starts corectly.
> 
> Is it possible to set up resource IP which isn't from this same
> network as primary network interface ?

Not currently.  The IP address selects its interface based on existing
IP addresses.

Ex.  If you have eth0 on 192.168.0.0/16 and eth1 on 172.16.0.0/16, and
add an IP 192.168.1.2, it will go on eth0.  If you add IP 172.16.1.2, it
will go on eth1.

Why is it done this way?

It's done this way because cluster nodes are not assumed to have all
NICs assigned the same ways.  For example, if you tell an IP to always
bind to eth0, and another node has eth0 on another network (but eth1 on
the correct network), the IP will be added to the wrong interface.  The
link will be up, but the service will be completely inaccessible to
clients.

The easy solution, I think, is to just add an IP to your bond0 interface
which is on the subnet, even if you shut off all traffic to that
interface using iptables.

-- Lon



From lhh at redhat.com  Fri May 12 14:11:10 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 12 May 2006 10:11:10 -0400
Subject: [Linux-cluster] RHCS4 heartbeat configuration
In-Reply-To: <44648572.6030105@streppone.it>
References: <44648572.6030105@streppone.it>
Message-ID: <1147443070.11396.136.camel@ayanami.boston.redhat.com>

On Fri, 2006-05-12 at 14:54 +0200, Cosimo Streppone wrote:
> I read the CS4 manual, but I can't seem to find
> a way to configure the heartbeat behaviour.
> 
> I have a 2-nodes cluster and I'd like to set up
> main network interface on eth0 and heartbeat
> interface on eth1 (or serial port? or both?).
> 
> Do I need to run the piranha_gui?

Nope -- not specifically, but it's a whole lot easier.

> I'm using a different httpd, not that shipped with RHEL,
> so I'm having a hard time running piranha_gui...

That isn't surprising. ;)

> Is it possible to manually configure it?

man 5 lvs.cf

> In this case, what is the configuration file for heartbeat?

For piranha, /etc/sysconfig/ha/lvs.cf

-- Lon




From lhh at redhat.com  Fri May 12 14:11:46 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 12 May 2006 10:11:46 -0400
Subject: [Linux-cluster] Re: Redhat Cluster
In-Reply-To: <1147401781.4033.4.camel@localhost.localdomain>
References: <1147401781.4033.4.camel@localhost.localdomain>
Message-ID: <1147443106.11396.138.camel@ayanami.boston.redhat.com>

On Fri, 2006-05-12 at 08:13 +0530, Nitin wrote:
   
>       I installed redhat cluster suite on 2 node cluster, i configured
> NFS service by using NFS druid. But we i am going to start the service
> buy clicking start cluster locally only one node it is showing started
> but when i go to other node and start the cluster both nodes are
> restarted. When i start cluster buy issuing command clumanager in both
> the node again the nodes are restarting.
> 
>  Please tell the solution for this problem.

What version of clumanager is it?

-- Lon




From vlaurenz at advance.net  Fri May 12 20:11:26 2006
From: vlaurenz at advance.net (Vito Laurenza)
Date: Fri, 12 May 2006 16:11:26 -0400
Subject: [Linux-cluster] CS4 & RHEL4: File System Errors
Message-ID: <4464EBEE.8030802@advance.net>

I'm having some issues with Cluster Suite 4 on RHEL4 u3.

I have a two node cluster with shared storage on a SAN.  The service is 
running in an Active-Passive environment hence only one node should be 
accessing the file system at a time.

I *was* having a problem where a manual failover (clusvcadm -r 
<service>) would fail time to time because the unmounting the file 
system failed due to being "busy".  I then added force_unmount=1 to my 
fs tag in cluster.conf to ensure the file system was unmounted.  This 
seemed to solve the failover failures, however, I began to get journal 
errors on my shared storage.

1.  Are these errors cropping up because of the forced unmount?
2.  How can I ensure that a mount or unmount done by CS is clean?

Thanks in advance.

:::: Vito Laurenza
::   Systems Administrator
::   Advance Internet
::   201.793.1807
::   vlaurenz at advance.net



From cosimo at streppone.it  Fri May 12 21:36:56 2006
From: cosimo at streppone.it (Cosimo Streppone)
Date: Fri, 12 May 2006 23:36:56 +0200
Subject: [Linux-cluster] RHCS4 heartbeat configuration
In-Reply-To: <1147443070.11396.136.camel@ayanami.boston.redhat.com>
References: <44648572.6030105@streppone.it>
	<1147443070.11396.136.camel@ayanami.boston.redhat.com>
Message-ID: <4464FFF8.3010000@streppone.it>

Lon Hohberger wrote:

> On Fri, 2006-05-12 at 14:54 +0200, Cosimo Streppone wrote:
> 
>>I read the CS4 manual, but I can't seem to find
>>a way to configure the heartbeat behaviour.
> 
> man 5 lvs.cf

Don't know why I didn't think of firing up man... :-)
Thanks for your assistance.

-- 
Cosimo



From jason at monsterjam.org  Sat May 13 02:34:16 2006
From: jason at monsterjam.org (Jason)
Date: Fri, 12 May 2006 22:34:16 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
	(getting closer!)
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E8B@tmaemail.techma.com>
References: <20060512015149.GB64851@monsterjam.org>
	<FF2CE0D593AEE34B955FEC77BD5AFBE0079E8B@tmaemail.techma.com>
Message-ID: <20060513023416.GA62167@monsterjam.org>

woohoo!
I got it figgered out..
Ive got 
/dev/sdb1   (10 megs)
/dev/sdb2   (rest of disk)
make the pools, did the ccs_tool create ,
did service ccsd start
did service lock_gulmd start  (but had to figger out my DNS issues first ;)
now im at the point where I do 
gfs_mkfs -p lock_gulm -t  bla bla

and so now im doing 

[root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 8 /dev/pool/pool0
gfs_mkfs: Partition too small for number/size of journals
[root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 4 /dev/pool/pool0
gfs_mkfs: Partition too small for number/size of journals
[root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 2 /dev/pool/pool0
gfs_mkfs: Partition too small for number/size of journals
[root at tf1 cluster]# 

and cant figure out why its giving me grief

heres my pools config.

poolname pool0                       #name of the pool/volume to create
subpools 1                              #how many subpools make up this 
subpool 0 128 2 gfs_data        #first subpool, zero indexed, 128k stripe, 1 
pooldevice 0 0 /dev/sdb1        #physical device for pool 0, device 0 (again, zero indexed)
pooldevice 0 1 /dev/sdb2        #physical device for pool 0, device 1 (again, zero indexed)


regards,
Jason



From sunjw at onewaveinc.com  Sat May 13 04:10:50 2006
From: sunjw at onewaveinc.com (=?GB2312?B?y++/oc6w?=)
Date: Sat, 13 May 2006 12:10:50 +0800
Subject: [Linux-cluster] gfs withdrawed in function blkalloc_internal
Message-ID: <SERVERi5mMcVmO5CCja0004d33c@mail.onewaveinc.com>

Hi,all

I have a test cluster with 3 nodes which are nd09, nd10 and nd12.
The cluster software is the newest branch of STABLE, the kernel is 2.6.15.

In nd12:
I have 11 process to sequentially write to the GFS without speed limit,
each process will remove an oldest file after write finish of a newest file.
1 process to do 'ls' of the whole GFS.
200 thread to concurrently read 200 files which are written by the above processes.
5 process to do 'df' of the GFS with 0.5 second interval.

In nd10:
I have 1 process to write.
200 thread to read the same files in nd12.
1 process to do 'ls'.
5 process to do 'df'. 

In nd09:
200 thread to read the same files in nd12.
1 process to do 'ls'.
5 process to do 'df'.

After about 10 hours of the test, gfs withdrawed in node nd10 and nd12, the messages were:
<--
May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: fatal: assertion "x <= length" failed
May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2:   function = blkalloc_internal
May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2:   file = /home/sunjw/projects/cluster.STABLE/gfs-

kernel/src/gfs/rgrp.c
, line = 1458
May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2:   time = 1147476646
May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: about to withdraw from the cluster
May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: waiting for outstanding I/O
May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: telling LM to withdraw
May 13 07:30:49 nd12 kernel: lock_dlm: withdraw abandoned memory
May 13 07:30:49 nd12 kernel: GFS: fsid=test:gfs-dm1.2: withdrawn


May 13 07:30:54 nd10 kernel: GFS: fsid=test:gfs-dm1.1: jid=2: Trying to acquire journal lock...
May 13 07:30:54 nd10 kernel: GFS: fsid=test:gfs-dm1.1: jid=2: Busy
May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: fatal: assertion "x <= length" failed
May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1:   function = blkalloc_internal
May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1:   file = /home/sunjw/projects/cluster.STABLE/gfs-

kernel/src/gfs/rgrp.c
, line = 1458
May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1:   time = 1147477010
May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: about to withdraw from the cluster
May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: waiting for outstanding I/O
May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: telling LM to withdraw
May 13 07:36:54 nd10 kernel: lock_dlm: withdraw abandoned memory
May 13 07:36:54 nd10 kernel: GFS: fsid=test:gfs-dm1.1: withdrawn


May 13 01:20:05 nd09 kernel: dlm: gfs-dm1: process_lockqueue_reply id 62f203f3 state 0
May 13 01:41:09 nd09 kernel: dlm: gfs-dm1: process_lockqueue_reply id 6fa600de state 0
May 13 07:28:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Trying to acquire journal lock...
May 13 07:28:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Looking at journal...
May 13 07:28:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Acquiring the transaction lock...
May 13 07:28:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Replaying journal...
May 13 07:28:48 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Replayed 160 of 532 blocks
May 13 07:28:48 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: replays = 160, skips = 99, sames = 273
May 13 07:28:48 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Journal replayed in 1s
May 13 07:28:48 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Done
May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Trying to acquire journal lock...
May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Looking at journal...
May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Acquiring the transaction lock...
May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Replaying journal...
May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Replayed 6 of 71 blocks
May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: replays = 6, skips = 4, sames = 61
May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Journal replayed in 1s
May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Done
-->
The clock of 3 nodes are not in synchronization.
What should be the problem? 

Thanks for any reply,
Luckey



From jason at monsterjam.org  Sat May 13 17:49:09 2006
From: jason at monsterjam.org (Jason)
Date: Sat, 13 May 2006 13:49:09 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
	(getting closer!)
In-Reply-To: <20060513023416.GA62167@monsterjam.org>
References: <20060512015149.GB64851@monsterjam.org>
	<FF2CE0D593AEE34B955FEC77BD5AFBE0079E8B@tmaemail.techma.com>
	<20060513023416.GA62167@monsterjam.org>
Message-ID: <20060513174909.GB49184@monsterjam.org>

ok, figured that out too.. 
http://www.redhat.com/archives/linux-cluster/2005-January/msg00032.html
is what helped.

one last newbie question.. (i hope)
I had to mount my new gfs filesystem manually with 
mount -t gfs  /dev/pool/gfs1 /mnt/gfs/
the service gfs start did nothing..
returned a prompt seemingly without doing anything.. no errors, nothing in syslog.. nuthing..

hopefully Ill figger this out too.

Jason




On Fri, May 12, 2006 at 10:34:16PM -0400, Jason wrote:
> woohoo!
> I got it figgered out..
> Ive got 
> /dev/sdb1   (10 megs)
> /dev/sdb2   (rest of disk)
> make the pools, did the ccs_tool create ,
> did service ccsd start
> did service lock_gulmd start  (but had to figger out my DNS issues first ;)
> now im at the point where I do 
> gfs_mkfs -p lock_gulm -t  bla bla
> 
> and so now im doing 
> 
> [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 8 /dev/pool/pool0
> gfs_mkfs: Partition too small for number/size of journals
> [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 4 /dev/pool/pool0
> gfs_mkfs: Partition too small for number/size of journals
> [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 2 /dev/pool/pool0
> gfs_mkfs: Partition too small for number/size of journals
> [root at tf1 cluster]# 
> 
> and cant figure out why its giving me grief
> 
> heres my pools config.
> 
> poolname pool0                       #name of the pool/volume to create
> subpools 1                              #how many subpools make up this 
> subpool 0 128 2 gfs_data        #first subpool, zero indexed, 128k stripe, 1 
> pooldevice 0 0 /dev/sdb1        #physical device for pool 0, device 0 (again, zero indexed)
> pooldevice 0 1 /dev/sdb2        #physical device for pool 0, device 1 (again, zero indexed)
> 
> 
> regards,
> Jason
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From jason at monsterjam.org  Sat May 13 18:03:24 2006
From: jason at monsterjam.org (Jason)
Date: Sat, 13 May 2006 14:03:24 -0400
Subject: [Linux-cluster] question about creating partitions and gfs
	(RESOLVED)
In-Reply-To: <20060513174909.GB49184@monsterjam.org>
References: <20060512015149.GB64851@monsterjam.org>
	<FF2CE0D593AEE34B955FEC77BD5AFBE0079E8B@tmaemail.techma.com>
	<20060513023416.GA62167@monsterjam.org>
	<20060513174909.GB49184@monsterjam.org>
Message-ID: <20060513180324.GC49184@monsterjam.org>

had an entry in the /etc/fstab that it didnt like.. 
now I have 
/dev/pool/gfs1          /mnt/gfs                gfs     noatime 0 0
in the fstab..
that look sane?

Jason



From jason at monsterjam.org  Sat May 13 19:07:36 2006
From: jason at monsterjam.org (Jason)
Date: Sat, 13 May 2006 15:07:36 -0400
Subject: [Linux-cluster] question about rebooting master server
Message-ID: <20060513190736.GE49184@monsterjam.org>

so I have both servers tf1, and tf2 connected to shared storage Dell 220S with 6.0.2.
They both seem to access the shared storage fine, but if I reboot the node thats the master,
the slave cannot access the shared storage until the master comes back up..
heres the info from the logs.

(reboot of tf2, and this is the log on tf1)
May 13 14:55:29 tf1 heartbeat: [5333]: info: local resource transition completed.
May 13 14:55:35 tf1 kernel: lock_gulm: Checking for journals for node "tf2.localdomain"
May 13 14:55:35 tf1 lock_gulmd_core[5007]: Master Node has logged out. 
May 13 14:55:35 tf1 kernel: lock_gulm: Checking for journals for node "tf2.localdomain"
May 13 14:55:36 tf1 lock_gulmd_core[5007]: I see no Masters, So I am Arbitrating until enough Slaves talk 
to me. 
May 13 14:55:36 tf1 lock_gulmd_core[5007]: Could not send quorum update to slave tf1.localdomain 
May 13 14:55:36 tf1 lock_gulmd_LTPX[5014]: New Master at tf1.localdomain:192.168.1.5 
May 13 14:55:57 tf1 lock_gulmd_core[5007]: Timeout (15000000) on fd:6 (tf2.localdomain:192.168.1.6) 
May 13 14:56:32 tf1 last message repeated 2 times
May 13 14:57:40 tf1 last message repeated 4 times
May 13 14:58:31 tf1 last message repeated 3 times
May 13 14:58:45 tf1 lock_gulmd_core[5007]: Now have Slave quorum, going full Master. 
May 13 14:58:45 tf1 lock_gulmd_core[5007]: New Client: idx:2 fd:6 from (192.168.1.6:tf2.localdomain) 
May 13 14:58:45 tf1 lock_gulmd_LT000[5010]: New Client: idx 2 fd 7 from (192.168.1.5:tf1.localdomain) 
May 13 14:58:45 tf1 lock_gulmd_LTPX[5014]: Logged into LT000 at tf1.localdomain:192.168.1.5 
May 13 14:58:45 tf1 lock_gulmd_LTPX[5014]: Finished resending to LT000 
May 13 14:58:46 tf1 lock_gulmd_LT000[5010]: Attached slave tf2.localdomain:192.168.1.6 idx:3 fd:8 (soff:3 
connected:0x8) 
May 13 14:58:46 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Trying to acquire journal lock...
May 13 14:58:46 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Looking at journal...
May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Done
May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Trying to acquire journal lock...
May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Busy
May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Trying to acquire journal lock...
May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Busy
May 13 14:58:47 tf1 lock_gulmd_LT000[5010]: New Client: idx 4 fd 9 from (192.168.1.6:tf2.localdomain) 

is this normal? I would assume that when the master was rebooted, the other node should still be able to 
access the storage with no problems.

regards,
Jason






From kanderso at redhat.com  Sat May 13 19:37:23 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Sat, 13 May 2006 14:37:23 -0500
Subject: [Linux-cluster] question about rebooting master server
In-Reply-To: <20060513190736.GE49184@monsterjam.org>
References: <20060513190736.GE49184@monsterjam.org>
Message-ID: <1147549043.3077.3.camel@localhost.localdomain>

On Sat, 2006-05-13 at 15:07 -0400, Jason wrote:
> so I have both servers tf1, and tf2 connected to shared storage Dell 220S with 6.0.2.
> They both seem to access the shared storage fine, but if I reboot the node thats the master,
> the slave cannot access the shared storage until the master comes back up..
> heres the info from the logs.

> 
> is this normal? I would assume that when the master was rebooted, the other node should still be able to 
> access the storage with no problems.
> 
Yes it is normal.  The gulm lock manager requires a minimum of 3 nodes
in order to be able to determine who is master.  With only two nodes
running and you lose one, the remaining node has no way to determine
that you are not in a split brain situation.  So, the lock manager waits
until quorum is restored.  For a two node cluster, you need to be
running the GFS 6.1 and DLM for a lock manager on a 2.6 kernel.

Kevin



From jason at monsterjam.org  Sat May 13 19:46:33 2006
From: jason at monsterjam.org (Jason)
Date: Sat, 13 May 2006 15:46:33 -0400
Subject: [Linux-cluster] question about rebooting master server
In-Reply-To: <1147549043.3077.3.camel@localhost.localdomain>
References: <20060513190736.GE49184@monsterjam.org>
	<1147549043.3077.3.camel@localhost.localdomain>
Message-ID: <20060513194633.GG49184@monsterjam.org>

> Yes it is normal.  The gulm lock manager requires a minimum of 3 nodes
> in order to be able to determine who is master.  With only two nodes
> running and you lose one, the remaining node has no way to determine
> that you are not in a split brain situation.  So, the lock manager waits
> until quorum is restored.  For a two node cluster, you need to be
> running the GFS 6.1 and DLM for a lock manager on a 2.6 kernel.

aww man that blows.. ok, so assuming I get this 
Red Hat Enterprise Linux AS release 3 up to the 2.6 kernel and 
reinstall from the srpms at 
ftp.redhat.com:/pub/redhat/linux/enterprise/4/en/RHGFS/i386/SRPMS

does anyone forsee any problems?? I mean running the 6.1 GFS on the 2.6 kernel on a base
Red Hat Enterprise Linux AS release 3 box?

Jason



From ookami at gmx.de  Sat May 13 22:36:13 2006
From: ookami at gmx.de (Wolfgang Pauli)
Date: Sat, 13 May 2006 16:36:13 -0600
Subject: [Linux-cluster] cman ignores interface setting on ipv4
Message-ID: <200605131636.13466.ookami@gmx.de>

I am running the fc4 and installed the cluster tools with yum. I think my cman 
version is 1.0.0, is it possible that this version still ignores the 
interface settings. Does anybody know, where to get newer versions of the 
cluster software without compiling it myself?

Thanks!

> Hi,
>
> The current ipv4 code in the stable branch for cman completely ignores
> the interface="" attribute for multicast. I've attached a minimal patch
> that fixes that.
>
> I've only done minimal testing (ie it works here).. it will probably
> break if there is no interface set, etc.. Have fun ;)
>
> --
> Olivier Cr?te
> ocrete max-t com
> Maximum Throughput Inc.
>
> Index: cman/cman_tool/join.c
> ===================================================================
> RCS file: /cvs/cluster/cluster/cman/cman_tool/join.c,v
> retrieving revision 1.12.2.7.4.1
> diff -u -r1.12.2.7.4.1 join.c
> --- cman/cman_tool/join.c       31 May 2005 15:08:24 -0000     
> 1.12.2.7.4.1 +++ cman/cman_tool/join.c       19 Jul 2005 22:14:45 -0000
> @@ -79,6 +79,7 @@
>      int ret;
>      int he_errno;
>      uint32_t bcast;
> +    struct ifreq ifr;
>
>      memset(&mcast_sin, 0, sizeof(mcast_sin));
>      mcast_sin.sin_family = AF_INET;
> @@ -148,11 +149,14 @@
>
>      /* Join the multicast group */
>      if (bhe) {
> -       struct ip_mreq mreq;
> +       struct ip_mreqn mreq;
>         char mcast_opt;
>
>         memcpy(&mreq.imr_multiaddr, bhe->h_addr, bhe->h_length);
> -       memcpy(&mreq.imr_interface, he->h_addr, he->h_length);
> +       // memcpy(&mreq.imr_address, he->h_addr, he->h_length);
> +       mreq.imr_ifindex = if_nametoindex(comline->interfaces[num]);
> +       printf("num %d index %d if %s mcastname %s nodename %s\n", num,
> mreq.imr_ifindex, comline->interfaces[num], comline->multicast_names[num],
> comline->nodenames[num]); +
>         if (setsockopt(mcast_sock, SOL_IP, IP_ADD_MEMBERSHIP, (void
> *)&mreq, sizeof(mreq))) die("Unable to join multicast group %s: %s\n",
> comline->multicast_names[num], strerror(errno));
>
> @@ -162,6 +166,11 @@
>
>         mcast_opt = 0;
>         setsockopt(mcast_sock, SOL_IP, IP_MULTICAST_LOOP, (void
> *)&mcast_opt, sizeof(mcast_opt)); +
> +       if (setsockopt(mcast_sock, SOL_IP, IP_MULTICAST_IF, (void *)&mreq,
> sizeof(mreq))) { +               die("Unable to set multicast interface
> %s\n", comline->interfaces[num]); +       }
> +
>      }
>
>      /* Local socket */
> @@ -169,6 +178,17 @@
>      if (local_sock < 0)
>         die("Can't open local socket: %s", strerror(errno));
>
> +    strcpy(ifr.ifr_name, comline->interfaces[num]);
> +    ifr.ifr_addr.sa_family = AF_INET;
> +
> +    if (ioctl(local_sock, SIOCGIFADDR, &ifr ) < 0)
> +           die("Can't find IP ADDR for interface: %s", strerror(errno));
> +
> +
> +
> +    memcpy(&local_sin.sin_addr, &((struct sockaddr_in
> *)&ifr.ifr_addr)->sin_addr, +          sizeof(local_sin.sin_addr));
> +
>      if (bind(local_sock, (struct sockaddr *)&local_sin,
> sizeof(local_sin))) die("Cannot bind local address: %s", strerror(errno));



From ookami at gmx.de  Sat May 13 23:23:18 2006
From: ookami at gmx.de (Wolfgang Pauli)
Date: Sat, 13 May 2006 17:23:18 -0600
Subject: [Linux-cluster] magma and rgmanager compile error
Message-ID: <200605131723.18491.ookami@gmx.de>

I checked out the stable cluster software from cvs and got these two compile 
errors:

first error:
magma/lib/message.c: In function ?connect_nb?:
message.c:270: warning: pointer targets in passing argument 5 of ?getsockopt? 
differ in signedness

diff message.c message.c.~1.9.2.2.~
<       int ret, flags = 1, err;
<       unsigned int l;
---
>       int ret, flags = 1, err, l;

2nd error:
In file included from clulog.c:49:
../../include/clulog.h:49: error: multiple storage classes in declaration 
specifiers
clulog.c:67: error: static declaration of ?loglevel? follows non-static 
declaration
../../include/clulog.h:49: error: previous declaration of ?loglevel? was here
make[2]: *** [clulog.o] Error 1

rgmanager/src/clulib: diff clulog.c clulog.c.~1.2.2.1.~
67c67
< int   loglevel = LOGLEVEL_DFLT;
---
> static int   loglevel = LOGLEVEL_DFLT;

Cheers,

wolfgang



From pauli at grey.colorado.edu  Sun May 14 04:59:01 2006
From: pauli at grey.colorado.edu (Wolfgang Pauli)
Date: Sat, 13 May 2006 22:59:01 -0600
Subject: [Linux-cluster] multicast howto
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E75@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E75@tmaemail.techma.com>
Message-ID: <200605132259.01964.pauli@grey.colorado.edu>

OK. I still did not get it to work. But in the meantime I simplified the setup 
so that we only have two subnets divided only by the head node. I was hoping 
that somebody could give me some hints how I can get this to work, so here is 
a detailed description of our setup:

We have a computing cluster running myrinet and all this nodes are on a 
private subnet, in the middle there is our headnode with two ethernet devices 
one connected to the myrinet guys and the other one to our labmachines. This 
second ethernet device is now also on the same subnet as the labmachines. I 
was hoping that we won't need the multicast setup anymore. Currently the 
headnode is exporting a nfs filesystem, but we want to switch it to gfs.

I head the multihome setup from the wiki working, but dlm does not support it, 
so I canceled that. 

We can not use ethernet bonding (i guess), because than the nodes would not 
find the headnode anymore as it would then be on a different subnet. 

I thought this is like a standard setup, but it seems to be strange. I wish 
there was more documentation.

Thanks for any hints.

wolfgang



From roman.tobjasz at 7bulls.com  Thu May 11 06:54:57 2006
From: roman.tobjasz at 7bulls.com (Roman Tobjasz)
Date: Thu, 11 May 2006 08:54:57 +0200
Subject: [Linux-cluster] IP resource
Message-ID: <20060511065457.GD24151@warszawa.7bulls.com>

I configured two node cluster.
On each node I created bonding device (bond0) as primary network interface.
On the 1st node bond0 I assigned IP address 192.168.1.100 (network
192.168.1.0, netmask 255.255.255.0).
On the 2nd node bond0 I assigned IP address 192.168.1.101 (network and
netmask like above).
Next I created IP address 172.16.10.10 as a resource and added it to a
service.
Service doesn't start. If I change resource IP to 192.168.1.200
then service starts corectly.

Is it possible to set up resource IP which isn't from this same
network as primary network interface ?

Best regards.



From pcaulfie at redhat.com  Mon May 15 07:45:14 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 15 May 2006 08:45:14 +0100
Subject: [Linux-cluster] cman ignores interface setting on ipv4
In-Reply-To: <200605131636.13466.ookami@gmx.de>
References: <200605131636.13466.ookami@gmx.de>
Message-ID: <4468318A.9030901@redhat.com>

Wolfgang Pauli wrote:
> I am running the fc4 and installed the cluster tools with yum. I think my cman 
> version is 1.0.0, is it possible that this version still ignores the 
> interface settings. Does anybody know, where to get newer versions of the 
> cluster software without compiling it myself?
> 

Yes, 1.0.0 has that bug. The easiest way to get that version to use the
interface you want is to use the host name assigned to only that interface. eg
if you have two interfaces:

eth0 node1
eth1 node1a

then use node1a as the nodename in cluster.conf rather than node1
-- 

patrick



From stephen.willey at framestore-cfc.com  Mon May 15 10:46:22 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Mon, 15 May 2006 11:46:22 +0100
Subject: [Linux-cluster] Re: gfs_fsck problems (not doing
	get_get_meta_buffer)
In-Reply-To: <44634A44.9060309@framestore-cfc.com>
References: <44634A44.9060309@framestore-cfc.com>
Message-ID: <44685BFE.3010301@framestore-cfc.com>

Having looked into this a bit, it appears that gfs_fsck doesn't like
large drives.

It works fine on a 137Gb drive but fails instantly with the below
symptoms on a 10Tb RAID.

Is it still the case that GFS is not scalable to very large filesystems?

Stephen



Stephen Willey wrote:
> gfs_fsck seems to break my filesystem!
> 
> Here's the sequence of events (everything acts as expected unless I
> state otherwise):
> 
> pvcreate /dev/sda; pvcreate /dev/sdb
> vgcreate gfs_vg /dev/sda /dev/sdb
> vgdisplay
> lvcreate -l 4171379 gfs_vg -n gfs_lv (the extents number obviously
> gleaned from vgdisplay)
> vgchange -aly
> gfs_mkfs -p lock_dlm -t mycluster:gfs1 -j 8 /dev/gfs_vg/gfs_lv
> 
> mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2
> df -h /mnt/disk2
> cd /mnt/disk2
> touch 1 2 3 4 5 6 7 8 9 10
> ls -lh
> 
> cd ..
> umount /mnt/disk2
> gfs_fsck -nvv /dev/gfs_vg/gfs_lv (output below - notice I'm running it
> read-only)
> 
> Initializing fsck
> Initializing lists...
> Initializing special inodes...
> (file.c:45)     readi:  Offset (640) is >= the file size (640).
> (super.c:208)   8 journals found.
> ATTENTION -- not doing gfs_get_meta_buffer...
> 
> mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2
> cd /mnt/disk2 (successful)
> ls -lh (successful)
> 
> cd ..
> umount /mnt/disk2
> gfs_fsck -vv /dev/gfs_vg/gfs_lv (output below)
> 
> Initializing fsck
> Initializing lists...
> (bio.c:140)     Writing to 65536 - 16 4096
> Initializing special inodes...
> (file.c:45)     readi:  Offset (640) is >= the file size (640).
> (super.c:208)   8 journals found.
> ATTENTION -- not doing gfs_get_meta_buffer...
> 
> mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 (output below)
> 
> mount: No such file or directory
> 
> The syslog shows:
> 
> Lock_Harness 2.6.9-34.R5.2 (built May 11 2006 14:15:58) installed
> May 11 15:12:43 gfstest1 kernel: GFS 2.6.9-34.R5.2 (built May 11 2006
> 14:16:10) installed
> May 11 15:12:43 gfstest1 kernel: GFS: Trying to join cluster "fsck_dlm",
> "mycluster:gfs1"
> May 11 15:12:43 gfstest1 kernel: lock_harness:  can't find protocol fsck_dlm
> May 11 15:12:43 gfstest1 kernel: GFS: can't mount proto = fsck_dlm,
> table = mycluster:gfs1, hostdata =
> May 11 15:12:43 gfstest1 mount: mount: No such file or directory
> May 11 15:12:43 gfstest1 gfs: Mounting GFS filesystems:  failed
> 
> If I use the following to change the lock method, I can mount it again:
> 
> gfs_tool sb /dev/gfs_vg/gfs_lv proto lock_dlm
> 
> but shortly after I'll sometimes get I/O errors on the drive not letting
> me cd into it or ls or df.
> 
> fsck isn't supposed to break clean filesystems so does anyone have any
> ideas?
> 
> FYI - The other machines in the cluster were at no point mounting the
> filesystem during this exercise.
> 
> Stephen
> 



From stephen.willey at framestore-cfc.com  Mon May 15 10:50:23 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Mon, 15 May 2006 11:50:23 +0100
Subject: [Linux-cluster] Re: Size limits of the various components
In-Reply-To: <44609694.7060609@framestore-cfc.com>
References: <44609694.7060609@framestore-cfc.com>
Message-ID: <44685CEF.9070600@framestore-cfc.com>

I've had no replies to this but following the recent failure of gfs_fsck
I'm guessing GFS still doesn't scale well.

Or am I missing something?

Stephen



Stephen Willey wrote:
> We're testing GFS on 64 bit servers/64 bit RHEL4 and need to know how
> big LVM2 and GFS will scale.
> 
> Can anyone tell me the maximum sizes of these component parts:
> 
> GFS filesystem
> (C)LVM2 logical volume
> (C)LVM2 volume group
> (C)LVM2 physical volumes
> 
> We're considering building a filesystem that may need to scale to 100Tb
> or more and I've found various different answers on this list and elsewhere.
> 
> Stephen
> 



From Jon.Stanley at savvis.net  Mon May 15 13:21:08 2006
From: Jon.Stanley at savvis.net (Stanley, Jon)
Date: Mon, 15 May 2006 08:21:08 -0500
Subject: [Linux-cluster] Re: Size limits of the various components
Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B901DE25D5@s228130hz1ew08.apptix-01.savvis.net>

 

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Stephen Willey
> Sent: Monday, May 15, 2006 6:50 AM
> To: linux-cluster at redhat.com
> Cc: Daire Byrne
> Subject: [Linux-cluster] Re: Size limits of the various components
> 
> I've had no replies to this but following the recent failure 
> of gfs_fsck
> I'm guessing GFS still doesn't scale well.
> 
> Or am I missing something?
> 
> Stephen
> 

All of this being said, I've found that a filesystem of any type really
doesn't scale well beyond 500GB.  Not a technioal limitation really, but
rather one imposed by backup limitations - at the restore rates that we
see (using tape), that would take over a *year* to restore a 100TB
filesystem.  By splitting it, you have the advantage of being able to
use multiple tape drives and simultaneous restore sessions.

I'm assuming that either the system is not backed up and the data is not
critical, or you have some other method for restoring the filesystem
should it go south????



From awone at arrow.com  Mon May 15 13:11:12 2006
From: awone at arrow.com (Allen Wone)
Date: Mon, 15 May 2006 13:11:12 +0000 (UTC)
Subject: [Linux-cluster] Re: gfs withdrawed in function blkalloc_internal
References: <SERVERi5mMcVmO5CCja0004d33c@mail.onewaveinc.com>
Message-ID: <loom.20060515T151002-830@post.gmane.org>

Have you gotten any resolution on this?  I am having the same issue.





From teigland at redhat.com  Mon May 15 20:14:37 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 15 May 2006 15:14:37 -0500
Subject: [Linux-cluster] Re: gfs withdrawed in function blkalloc_internal
In-Reply-To: <SERVERi5mMcVmO5CCja0004d33c@mail.onewaveinc.com>
References: <SERVERi5mMcVmO5CCja0004d33c@mail.onewaveinc.com>
Message-ID: <20060515201437.GB9050@redhat.com>

On Sat, May 13, 2006 at 12:10:50PM +0800, ?????? wrote:

> The clock of 3 nodes are not in synchronization.
> What should be the problem? 

I can't explain the assertion, it wouldn't be caused by the clocks.
Unsynchronized clocks can slow down gfs significantly, though, by causing
constant inode atime updating/locking.

Dave



From teigland at redhat.com  Mon May 15 20:21:54 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 15 May 2006 15:21:54 -0500
Subject: [Linux-cluster] Re: gfs_fsck problems (not doing
	get_get_meta_buffer)
In-Reply-To: <44685BFE.3010301@framestore-cfc.com>
References: <44634A44.9060309@framestore-cfc.com>
	<44685BFE.3010301@framestore-cfc.com>
Message-ID: <20060515202154.GC9050@redhat.com>

On Mon, May 15, 2006 at 11:46:22AM +0100, Stephen Willey wrote:
> Having looked into this a bit, it appears that gfs_fsck doesn't like
> large drives.
> 
> It works fine on a 137Gb drive but fails instantly with the below
> symptoms on a 10Tb RAID.
> 
> Is it still the case that GFS is not scalable to very large filesystems?

It's probably more a case of no one ever even trying trying fsck on a fs
that large given how long it would probably take.

Dave



From mwill at penguincomputing.com  Mon May 15 15:29:01 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Mon, 15 May 2006 08:29:01 -0700
Subject: [Linux-cluster] Re: gfs_fsck problems (not
	doingget_get_meta_buffer)
Message-ID: <433093DF7AD7444DA65EFAFE3987879C125D05@jellyfish.highlyscyld.com>

Is the fs code closer to ext than to say xfs?

 -----Original Message-----
From: 	David Teigland [mailto:teigland at redhat.com]
Sent:	Mon May 15 08:23:42 2006
To:	Stephen Willey
Cc:	Daire Byrne; linux-cluster at redhat.com
Subject:	Re: [Linux-cluster] Re: gfs_fsck problems (not doingget_get_meta_buffer)

On Mon, May 15, 2006 at 11:46:22AM +0100, Stephen Willey wrote:
> Having looked into this a bit, it appears that gfs_fsck doesn't like
> large drives.
> 
> It works fine on a 137Gb drive but fails instantly with the below
> symptoms on a 10Tb RAID.
> 
> Is it still the case that GFS is not scalable to very large filesystems?

It's probably more a case of no one ever even trying trying fsck on a fs
that large given how long it would probably take.

Dave

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060515/d7bd4bce/attachment.htm>

From stephen.willey at framestore-cfc.com  Mon May 15 15:33:03 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Mon, 15 May 2006 16:33:03 +0100
Subject: [Linux-cluster] Re: gfs_fsck problems (not doing
	get_get_meta_buffer)
In-Reply-To: <20060515202154.GC9050@redhat.com>
References: <44634A44.9060309@framestore-cfc.com>
	<44685BFE.3010301@framestore-cfc.com>
	<20060515202154.GC9050@redhat.com>
Message-ID: <44689F2F.6000002@framestore-cfc.com>

I filed a bug with redhat on this and it was a duplicate of this bug:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186125

Stephen



David Teigland wrote:
> On Mon, May 15, 2006 at 11:46:22AM +0100, Stephen Willey wrote:
>> Having looked into this a bit, it appears that gfs_fsck doesn't like
>> large drives.
>>
>> It works fine on a 137Gb drive but fails instantly with the below
>> symptoms on a 10Tb RAID.
>>
>> Is it still the case that GFS is not scalable to very large filesystems?
> 
> It's probably more a case of no one ever even trying trying fsck on a fs
> that large given how long it would probably take.
> 
> Dave
> 



From teigland at redhat.com  Mon May 15 20:40:45 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 15 May 2006 15:40:45 -0500
Subject: [Linux-cluster] Re: gfs_fsck problems (not
	doingget_get_meta_buffer)
In-Reply-To: <433093DF7AD7444DA65EFAFE3987879C125D05@jellyfish.highlyscyld.com>
References: <433093DF7AD7444DA65EFAFE3987879C125D05@jellyfish.highlyscyld.com>
Message-ID: <20060515204045.GD9050@redhat.com>

On Mon, May 15, 2006 at 08:29:01AM -0700, Michael Will wrote:
> Is the fs code closer to ext than to say xfs?

I wouldn't say GFS code is close to anything.

Dave



From jmendler at ucla.edu  Mon May 15 15:46:24 2006
From: jmendler at ucla.edu (Jordan Mendler)
Date: Mon, 15 May 2006 08:46:24 -0700
Subject: [Linux-cluster] RHEL AS4 Server Farming options
Message-ID: <1147707984.1177.9.camel@localhost.localdomain>

I am looking to build a server farm for 5-10 RHEL AS4 web servers that
handle at a given time anywhere from 30-50 different web domains and
their sites (using AOLServer). I am in the preliminary steps on
researching this and so far the only program I have heard about is Lnux
Virtual Server. Can anyone tell me if there are any other software
options to consider for this project aside from LVS and the pro's and
cons of LVS versus something else?

Also if anyone has any pointers, tips, praise or criticism, or other
good information I would love to hear it now before I start building the
farm and it is too late. Lastly any good websites or other documentation
aside from the RHAS v2.1 LVS section and LVS's webpage would be greatly
appreciated.

Thanks, Jordan



From Jon.Stanley at savvis.net  Mon May 15 15:55:48 2006
From: Jon.Stanley at savvis.net (Stanley, Jon)
Date: Mon, 15 May 2006 10:55:48 -0500
Subject: [Linux-cluster] Re: gfs_fsck problems (not
	doingget_get_meta_buffer)
Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B901DE28E3@s228130hz1ew08.apptix-01.savvis.net>

 

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Stephen Willey
> Sent: Monday, May 15, 2006 11:33 AM
> To: David Teigland
> Cc: Daire Byrne; linux-cluster at redhat.com
> Subject: Re: [Linux-cluster] Re: gfs_fsck problems (not 
> doingget_get_meta_buffer)
> 
> I filed a bug with redhat on this and it was a duplicate of this bug:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186125
> 
> Stephen
> 

To which mere mortals do not have access :-(



From rpeterso at redhat.com  Mon May 15 16:05:26 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Mon, 15 May 2006 11:05:26 -0500
Subject: [Linux-cluster] Re: gfs_fsck problems (not doing
	get_get_meta_buffer)
Message-ID: <1147709126.16950.45.camel@technetium.msp.redhat.com>

Stephen Willey wrote:
> gfs_fsck seems to break my filesystem!

This is a known problem documented in Bugzilla as bz 186125.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186125

There is a hotfix available for it that may be downloaded
from:

http://seg.rdu.redhat.com/scripts/hotfix/edit.pl?id=980 

Regards,

Bob Peterson
Red Hat Cluster Suite




From hson at ludd.luth.se  Mon May 15 16:22:29 2006
From: hson at ludd.luth.se (=?ISO-8859-1?Q?Roger_H=E5kansson?=)
Date: Mon, 15 May 2006 18:22:29 +0200
Subject: [Linux-cluster] RH Cluster Suite 4 + GFS + iptables
In-Reply-To: <442D46C3.6050807@dsic.upv.es>
References: <442D46C3.6050807@dsic.upv.es>
Message-ID: <4468AAC5.4000403@ludd.luth.se>

Jose Luis Beti wrote:
> Hi everybody,
> I'm new in the list, so I apologize if this question has been answered
> before.
> 
> Anyone could explain me how to configure IPTABLES  to allow working
> right Redhat Cluster Suite RHEL4 + GFS?
> What ports and protocols (tcp, udp) should I configure?
> 

I've been searching through the archives and this question pops up now
and again, but I can't find any answers except some old ones back in
2004 when similar questions wer answered in a bunch of separate mails.
But those answers were related to RHEL3 and from what I can see, some
things have changed

This was the answer back then:
"Gulm uses the following by default:
 40040  core
 40042  ltpx
 41040  lt000
  if you set lt_partitions to >1 then
 41041  lt001 (and up to what ever you set lt_partitions to.)"
"CCS is 50006 and 50005"
"the gnbd server uses 14243"
"34001 - 34004 for clumanager"
"Also 1228 / 1229 for broadcast / multicast heartbeating"



>From the output from netstat, this is the listening ports I can see on
my CentOS4-setup :

Unknown processes ($NODENAME is the IP of the cluster node, and
$BROADCAST is the broadcast address of the cluster node's network):
TCP $NODENAME:21064
UDP $NODENAME:6809
UDP $BROADCAST:6809

clurgmgrd:
TCP *:41966
TCP *:41968
TCP *:41967
TCP *:41969

ccsd:
TCP localhost:50006
UDP *:50007
TCP *:50008
TCP *:50009


Also, I can see a active tcp connection between
$nodeA:21064<->$nodeB:32774 and $nodeB:21064<->$nodeA:32773
--
Roger H?kansson




From rpeterso at redhat.com  Mon May 15 16:41:53 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Mon, 15 May 2006 11:41:53 -0500
Subject: [Linux-cluster] Re: gfs_fsck problems (not
	doingget_get_meta_buffer)
In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B901DE28E3@s228130hz1ew08.apptix-01.savvis.net>
References: <9A6FE0FCC2B29846824C5CD81C6647B901DE28E3@s228130hz1ew08.apptix-01.savvis.net>
Message-ID: <1147711313.16950.53.camel@technetium.msp.redhat.com>

On Mon, 2006-05-15 at 10:55 -0500, Stanley, Jon wrote:
> To which mere mortals do not have access :-(

Hi Jon,

The fix was to fs_bmap.c.  The CVS source tree has the fixes for
the RHEL4, STABLE and HEAD branches, which should take care of most
people, RHEL4, Fedora, or otherwise. And that is public.  Here's a link:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_fsck/fs_bmap.c?cvsroot=cluster 

No one has requested a fix for RHEL3, so one hasn't been done.
In addition, I tried to open up this bugzilla for better viewing.

Regards,

Bob Peterson
Red Hat Cluster Suite




From pauli at grey.colorado.edu  Mon May 15 17:15:36 2006
From: pauli at grey.colorado.edu (Wolfgang Pauli)
Date: Mon, 15 May 2006 11:15:36 -0600
Subject: [Linux-cluster] cman ignores interface setting on ipv4
In-Reply-To: <4468318A.9030901@redhat.com>
References: <200605131636.13466.ookami@gmx.de> <4468318A.9030901@redhat.com>
Message-ID: <200605151115.36176.pauli@grey.colorado.edu>

Thanks, that worked. Patick said I can not use the multihome setup with dlm, 
can I do that with gulm?

wolfi



From rmm-linux-cluster at z.odi.ac  Mon May 15 20:13:37 2006
From: rmm-linux-cluster at z.odi.ac (Ross Mellgren)
Date: Mon, 15 May 2006 16:13:37 -0400
Subject: [Linux-cluster] Periodic hang of file system accesses using
 GFS/GNBD (gnbd (pid 12082: du) got signal 1)
Message-ID: <4468E0F1.9090108@z.odi.ac>

Hi, I have a two-node cluster where each node exports filesystems to the
other node, e.g.

nodeA:
  2tb array /dev/sdc
  LVM PV/VG/LV created
  /dev/nodea_sdc_vg/lvol0 mounted on /array/nodea
  /dev/sdc is exported via gnbd
  nodeb gnbd (/dev/sdc) device is imported
  /dev/nodeb_sdc_vg/lvol0 mounted on /array/nodeb

nodeB:
  2tb array /dev/sdc
  LVM PV/VG/LV created
  /dev/nodeb_sdc_vg/lvol0 mounted on /array/nodeb
  /dev/sdc is exported via gnbd
  nodea gnbd (/dev/sdc) device is imported
  /dev/nodea_sdc_vg/lvol0 mounted on /array/nodea

Everything seemed to work fine when I set it up. I ran some bonnie++
tests with pretty vigorous settings, on each node against it's local GFS
and on each node against the remote GFS, and the same simultaneously,
everything worked fine.

I've now put 200+gb of data on it and I'm encountering the problem where
normal processes like find, du, or ls hang against nodeb's array while
on nodea. Messages like the following appear in the dmesg on nodea (note
that I have not used kill on any of these processes, so I'm not kill
-9'ing them to get this):

gnbd (pid 12082: du) got signal 9
gnbd0: Send control failed (result -4)
gnbd0: Receive control failed (result -32)
gnbd0: shutting down socket
exitting GNBD_DO_IT ioctl
resending requests
gnbd (pid 12082: du) got signal 1
gnbd0: Send control failed (result -4)
gnbd (pid 20598: find) got signal 9
gnbd0: Send control failed (result -4)
gnbd (pid 4238: diff) got signal 9
gnbd0: Send control failed (result -4)
gnbd0: Receive control failed (result -32)
gnbd0: shutting down socket
exitting GNBD_DO_IT ioctl
resending requests

Looking at the code with my limited knowledge of kernel programming, it
looks like this means that a SIGKILL/SIGSEGV got trapped during the
sock_sendmsg/sock_recvmsg? It's pretty easy to get this problem to manifest.

I can clear the hang by doing gndb_export -O -R on the server (nodeb)
and reexport. The client (nodea) automatically picks up the
disconnect/reconnect and SIGKILL's the hung process.

After this has happened a bunch of times, it looks like the GFS has got
a little corrupted -- I ran a gfs_fsck -y -v on it and it cleaned up a
bunch of fsck bitmap mismatches.

It doesn't look like network connectivity is being lost at all between
the two nodes, but I can't be absolutely sure a single packet didn't get
dropped here or there.

Any help would be greatly appreciated!

-Ross


Vital statistics of the systems (both are running identical kernel +
GFS/GNBD/CMAN/etc modules, compiled on one and copied to the other)

Linux nodea 2.6.12.6 #2 SMP Fri Apr 14 19:59:14 EDT 2006 i686 i686 i386
GNU/Linux

cman-kernel-2.6.11.5-20050601.152643.FC4
dlm-kernel-2.6.11.5-20050601.152643.FC4
gfs-kernel-2.6.11.8-20050601.152643.FC4
gnbd-kernel-2.6.11.2-20050420.133124.FC4

Both boxes are dual xeons 2.8ghz with 4gb ram each (but with the BIOS
memory mapping issue that prevents us from seeing all 4gb, so really
3.3gb). The arrays are SATA arrays on top of Areca cards -- one box has
dual ARC-1120's and the other has a single ARC-1160 split up using LVM.




From basv at sara.nl  Tue May 16 12:53:23 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 16 May 2006 14:53:23 +0200
Subject: [Linux-cluster] Module gfs_kernel does not compile from CVS stable
Message-ID: <4469CB43.1080509@sara.nl>

Hello,

  Due to the new build software i always get an error that it can not
  find <cluster/cnxman.h>. It is in /usr/include/cluster/cnxman,h but not
  in the kernel source directory.

  A simple solution is to  make in kernel source directory a link to
  the one in /usr/include/cluster:
    - cd /usr/src/linux/include
    - ln -s /usr/include/cluster .

This solves my problem, but maybe there is a better solution?

Regards
-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************



From system_admin at pah156.warszawa.sdi.tpnet.pl  Tue May 16 13:06:10 2006
From: system_admin at pah156.warszawa.sdi.tpnet.pl (Czeslaw M)
Date: Tue, 16 May 2006 15:06:10 +0200
Subject: [Linux-cluster] Web services 2 node cluster
Message-ID: <20060516130610.GA5613@pah156.warszawa.sdi.tpnet.pl>

Good Day everyone.

I have read archives but could not find answer for my case.

Situation:
2 node cluster with 5-6 web demons (Apache) running on virtual IP,
system is RedHat Enterprise 4 (Nahant Update 2).

cluster.conf looks like:
---------------- cut ----------------
<?xml version="1.0" ?>
<cluster config_version="9" name="www-cluster">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="emisweb04" votes="1">
                        <multicast addr="225.0.0.13" interface="eth0"/>
                        <fence/>
                </clusternode>
                <clusternode name="emisweb02" votes="1">
                        <multicast addr="225.0.0.13" interface="eth0"/>
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="225.0.0.13"/>
        </cman>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="EMIS_Service" ordered="0" restricted="0">
                                <failoverdomainnode name="emisweb04" priority="1"/>
                                <failoverdomainnode name="emisweb02" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="192.168.200.26" monitor_link="1"/>
                        <script file="/etc/rc.d/init.d/apache" name="ISI Web Server"/>
                </resources>
                <service autostart="1" domain="EMIS_Service" exclusive="1" name="EMIS Web Server">
                        <ip ref="192.168.200.26"/>
                        <script ref="Multi Web Server"/>
                </service>
        </rm>
</cluster>

---------------- cut ----------------

It generally comes up and runs. Everything is on local network then no
router is involved. 
Worse comes when i try give it stresstest like shut down interface eth0.
With one httpd demon running service switches to failover node in most
casea. Generally cluster works for 90% of what I expect from it.

But with more web demons up it is not functioning properly.
Possibly /etc/rc.d/init.d/apache is written in a way improper for
cluster as it gives result:

[root at emisweb04 ~]# service apache status
httpd (pid 3344 3343 3342 3341 3340 3339 3338 3337 3336 3335 3334 3333
3332 3331 3316 3315 3314 3313 3312 3311 3310 3309 3308 3307 3306 3305
3304 3303 3302 3301 3300 3292 3291 3290 3289 3288 3287 3286 3285 3284
3283 3279 3278 3277 3276 3275 3274 3273 3272 3271 3270 3269 3268 3267
3262 3257 3252) is running...

I may kill some of demons but status will not change - many processes
will be shown as running then cluster is fooled. It doesn't react 
and does not bring killed demons back up.

Well - this is easy to resolve - I may split demons for separate services.

Worse thing is that after bringing eth0 back up multicast addres
doesn't come up and cluster demons on this node need to be restarted.

Yet more worse is that when i try to stop cluster - cman processes 
remain and in a hard way - it is not possible to kill them:

They are:
     |-cman_comms
     |-cman_hbeat
     |-cman_memb
     |-events/3---cman_serviced

Then virtual IP is not released and custer cannot me started back:
/var/log/messages:
dlm: process_cluster_request invalid lockspace 1000003 from 2 req 1
dlm: process_cluster_request invalid lockspace 1000003 from 2 req 1

Must I set fencing ? 
(for now it is not set). I'd prefer not to get into reboot loop if 
something goes wrong.

Any hint appreciated ...
May Power be with you.
-- 

          Czeslaw M <system_admin at pah156.warszawa.sdi.tpnet.pl>



From eric at bootseg.com  Tue May 16 18:17:12 2006
From: eric at bootseg.com (Eric Kerin)
Date: Tue, 16 May 2006 14:17:12 -0400
Subject: [Linux-cluster] New APC agent
In-Reply-To: <4447FA47.8090203@redhat.com>
References: <4447FA47.8090203@redhat.com>
Message-ID: <1147803432.3384.25.camel@auh5-0479.corp.jabil.org>


On Thu, 2006-04-20 at 17:16 -0400, James Parsons wrote:
> Hello all,
> 
> This is an snmp based fence agent for APC power switches to be used
> with RHEL4 Red Hat Cluster Suite.
>                                                                                 
<SNIP>
> Please let me know how this agent works.
> 
I finally got around to trying it out with my hardware, and wasn't able
to get it working as a fence agent.  I tracked it down to two problems
in the script when reading the commands from stdin.

1. Never parses the port parameter
2. Needs to trim the newline from each parameter read from stdin for the
snmpset command to work properly.

The attached patch cleared it up for me (please excuse any coding or
style mistakes, first time coding in Python)  I've also included the
relevant sections of my cluster.conf, just for completeness sake.

Once I made these changes it worked quite well on my APC 7900's.  And
now I can rename the outlets to descriptive names instead of "outlet
1".  


Now on to comments: 
It would be nice to be able to set the community via the cluster.conf
file, right now it's hard coded in the fence agent to private.


Thanks, 
Eric Kerin
eric at bootseg.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_apc_snmp-fixstdin.patch
Type: text/x-patch
Size: 661 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060516/9907603d/attachment.bin>
-------------- next part --------------
                <clusternode name="auhjpsn01a" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="AUHAPC01a" port="1" option="off"/>
                                        <device name="AUHAPC01b" port="1" option="off"/>
                                        <device name="AUHAPC01a" port="1" option="on"/>
                                        <device name="AUHAPC01b" port="1" option="on"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="auhjpsn01b" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="AUHAPC01a" port="2" option="off"/>
                                        <device name="AUHAPC01b" port="2" option="off"/>
                                        <device name="AUHAPC01a" port="2" option="on"/>
                                        <device name="AUHAPC01b" port="2" option="on"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_apc_snmp" ipaddr="XXX.XXX.XXX.205" login="" name="AUHAPC01a" passwd=""/>
                <fencedevice agent="fence_apc_snmp" ipaddr="XXX.XXX.XXX.206" login="" name="AUHAPC01b" passwd=""/>
        </fencedevices>

From ciril at hcl.in  Wed May 17 04:57:28 2006
From: ciril at hcl.in (Ciril Ignatious T)
Date: Wed, 17 May 2006 10:27:28 +0530
Subject: [Linux-cluster] Failover Mail Server Without Shared Storage
Message-ID: <446AAD38.2040407@hcl.in>



Hi All

I have a requirement where i need a fail over Mail Server without shared 
storage.I need a solution like server mirroring so that i can avoid 
shared storage.If any such solution is there pls help me..

Thanks and Regards

Ciril






From sunjw at onewaveinc.com  Wed May 17 09:52:13 2006
From: sunjw at onewaveinc.com (=?gb2312?B?y++/oc6w?=)
Date: Wed, 17 May 2006 17:52:13 +0800
Subject: [Linux-cluster] Re: Re: gfs withdrawed in function blkalloc_internal
Message-ID: <SERVERmvqk7gnFgit5x0004ff43@mail.onewaveinc.com>

No, I have not. But I have a question.
Which kernel do you use when you encounter this problem?
My kernel is the latest 2.6.15 branch kernel in kernel.org.

Thanks,
Luckey




From mykleb at no.ibm.com  Wed May 17 10:36:22 2006
From: mykleb at no.ibm.com (Jan-Frode Myklebust)
Date: Wed, 17 May 2006 12:36:22 +0200
Subject: [Linux-cluster] Re: Failover Mail Server Without Shared Storage
References: <446AAD38.2040407@hcl.in>
Message-ID: <slrne6lv56.j87.mykleb@99RXZYP.ibm.com>

On 2006-05-17, Ciril Ignatious T <ciril at hcl.in> wrote:
>
> I have a requirement where i need a fail over Mail Server without shared 
> storage.I need a solution like server mirroring so that i can avoid 
> shared storage.If any such solution is there pls help me..

That can be done using heartbeat from linux-ha.org, plus DRBD for mirroring 
over the network. 

	http://www.drbd.org/drbd-article.html
	http://wiki.linux-ha.org/GettingStarted/DRBD

I've used a similar setup for mail server (POP, IMAP, SMTP) with heartbeat
and ServeRAID for failing over the disks between the nodes. This has the
advantage of hardware fencing (two nodes are not allowed to use the disks 
at the same time). 

	http://wiki.linux-ha.org/ServeRAID


  -jf



From iana at atlas.ua  Wed May 17 11:19:57 2006
From: iana at atlas.ua (iana)
Date: Wed, 17 May 2006 14:19:57 +0300
Subject: [Linux-cluster] Cluster Setup Questions
Message-ID: <1974688210.20060517141957@atlas.ua>

Somehow, I din't see two my previous messages in the list.
So I try a third time.

======================
Hello Everyone.

I read the manual and the Knowledge Base, and the posts here and still
there is so much unclear that I have to ask for your support. I have a
complex  task,  so  prepare  for  a large post and complex setup. I am
mostly  new to clustering and I was given a fixed set of options, so I
wasn't able to choose hardware and software and must operate with what
I  was  given.  So,  below  are mine considerations on the "project" I
have. I ask everyone who met such setups before to help me.

The task & the environment:
===========================

I need to setup a highly-availabile cluster to run two apps (Task1 and
Task2)  that  also  communicate  tightly with each other (So I want an
"Active-Active"  setup).  I  want  to  have  as  much availability and
continuity  as  possible within my current environment (i.e. w/o bying
much  additional  hardware  and  largely  redesigning  the network).
The scheme that illustrates the setup is available here:
www.atlas.ua/apc/cluster-post.jpg (93?).

The show takes place in a large organization with two datacenters that
are   physically   separated  on  distant  sites  in  the  city  (like
catastrophe-proof  setup).  The  datacenters  are connected with Fiber
optic  running  a  number  of  protocols (so called 'Metropolitan Area
Network' - MAN). Among them there are SDH,  Gigabit Ethernet and Fible
Channel SAN.

In each datacenter there is
 * a SAN Storage (IBM DS8100, IBM ESS "Shark")
 * two SAN Switches (IBM TotalStorage SAN Switch 2109-F32)
 *  DB/App  Server  (IBM  IBM  eServer  xSeries  460 w 2 FC HBAs, each
 plugged  in  separate  SAN  Switch  and  multipathing,  running  RHEL
 4.2+RHCS)
 * Front-End Server (Win2K3 + IIS/ASP)
The SAN Storages have mirroring between them (can be turned on and off
for each partition).
 
Availability requirements:
==========================
A1)  I  want Active-Active configuration, where Node1 is running Task1
and  backing  up  Task2,  and Node2 is running Task2 and is backing up
Task1.
A2)  I  want  that  if  one  SAN Storage fails, the cluster could work
further without need to restart processes.
A3)  I  want  that  if  one  HBA  of SAN switch fails i could continue
working (currently solved with IBMsdd multipath driver)
A4)  If  server  HardDrive  fails  the  server should continue working
(solved by creating RAID 1 on servers' hot-swappable drives).
A5)  If one of two server's LAN links fails the server should continue
working  (want  to  resolve  it  by  trunking  VLANs  over  the bonded
interfaces,  but have doubts if it is possible - how does that bonding
work???).

Limitations:
============
L1) It's impossible to obtain Storage Virtualization Manager (due to a
lot  of bureaucratic problems) to make both SAN storages appear as one
to RHCS and solve requirement A3. :(((
L2)  I  am  limited  to RHEL 4u2, RHCS 4u2, because the soft is highly
sensitive to a lot of Linux bugs that appear and disappear from update
to  update..  When  I'll be able to get the cluster running on RHEL4.2
I'll make developers adapt it to QU3 or whatever it'll be.
L2.1) Also, where is no GFS.
L3)  The  staff  running  all  the  SAN,  actually, can only find that
something's  broken.  Those,  who implemented the entire large network
are long gone, and the current guys have no or little understanding of
what  their  boxes  can  do  and how ;( I'm not familiar with IBM SANs
either :( (though, I'm studying hard now).

What I did now:
===============
* attached all the hardware together.
* connected partitions on SAN storages.
* decided IP addresses.
* established two VLANs:
 + for servers' heartbeat
 + for public interfaces and common cluster IP.
* Installed RHEL 4.2, IBM sdd driver, RHCS

What has to be decided:
=======================
Currently, no fencing is decided and two options are considered:
 1. APC AP7921 Rack PDUs (still don't quite imagine how it'll work
 with servers with dual power supplies, but this can be solved)
 2. Integrated IBM Slimline RSA II controller (Don't know if it's
 supported by RHCS).

I  found  from  the  mailing  list, that there are problems with power
fencing   when  losing  LAN  connectivity  (no  heartbeat  =>  servers
powercycle each other). Haven't found a way to overcome it, though ;(

The  manual says that RHCS can work with single storage only. Is there
a  way  to  overcome it without storage virtualization? I really don't
want building software RAID arrays over MAN.

If  not  - I can establish SAN mirroring then, and mirror all the suff
from  "primary"  storage  to  backup  storage.  Is there a way to make
cluster   automatically   check   health   of   "active"  storage  and
automatically  remount  FS  from backup storage if active storage goes
down?  The only solution I found is to do this from customized service
status script..

If  so  -  can I run the cluster without common FS resource? My simple
tests show that that's possible, but want proofs.

The questions:
==============
Besides the questions asked above, the two main questions are:
 * How should I design the cluster so it works like I want?
 * How should I implement it?
 
Some smaller questions:
* If the status script returns 0 - status is ok. What happens if not?
Does the Cluster Software first try to restart the service or fails it
over to the next node? If tries to restart - by which means ('service
XXX restart', kill -9, or smth else)?

*  How  does  cluster  software  check  the  status  of infrastructure
resources  (IPs,  FSes, heartbeat). Can I change the course of actions
there by writing my own scripts?



Thanks in advance to those brave enough to answer =)






-- 
? ?????????,
 iana                          mailto:iana at atlas.ua





From jparsons at redhat.com  Wed May 17 12:40:51 2006
From: jparsons at redhat.com (James Parsons)
Date: Wed, 17 May 2006 08:40:51 -0400
Subject: [Linux-cluster] Cluster Setup Questions
In-Reply-To: <1974688210.20060517141957@atlas.ua>
References: <1974688210.20060517141957@atlas.ua>
Message-ID: <446B19D3.1040208@redhat.com>

iana wrote:

>---------------8< snip, snip ---------------------------------
>
>What has to be decided:
>=======================
>Currently, no fencing is decided and two options are considered:
> 1. APC AP7921 Rack PDUs (still don't quite imagine how it'll work
> with servers with dual power supplies, but this can be solved)
> 2. Integrated IBM Slimline RSA II controller (Don't know if it's
> supported by RHCS).
>
We have a fence agent for RSA II, but it has never been tested with the 
slimline controller; only the standard RSA II controller.

 AP7921  is a supported power switch and a good choice. In a dual power 
supply environment, just configure the general device in the UI under 
fence devices, then configure two fences for each node (on the same 
fence level). The ui app will note that there are two power fences on 
the level and then write the correct 'off-on' sequence into the conf 
file. If you have any questions about the conf file and dual power 
supplies, just post your file here and someone will be able to help.

-J

>
>I  found  from  the  mailing  list, that there are problems with power
>fencing   when  losing  LAN  connectivity  (no  heartbeat  =>  servers
>powercycle each other). Haven't found a way to overcome it, though ;(
>
>The  manual says that RHCS can work with single storage only. Is there
>a  way  to  overcome it without storage virtualization? I really don't
>want building software RAID arrays over MAN.
>
>If  not  - I can establish SAN mirroring then, and mirror all the suff
>from  "primary"  storage  to  backup  storage.  Is there a way to make
>cluster   automatically   check   health   of   "active"  storage  and
>automatically  remount  FS  from backup storage if active storage goes
>down?  The only solution I found is to do this from customized service
>status script..
>
>If  so  -  can I run the cluster without common FS resource? My simple
>tests show that that's possible, but want proofs.
>
>The questions:
>==============
>Besides the questions asked above, the two main questions are:
> * How should I design the cluster so it works like I want?
> * How should I implement it?
> 
>Some smaller questions:
>* If the status script returns 0 - status is ok. What happens if not?
>Does the Cluster Software first try to restart the service or fails it
>over to the next node? If tries to restart - by which means ('service
>XXX restart', kill -9, or smth else)?
>
>*  How  does  cluster  software  check  the  status  of infrastructure
>resources  (IPs,  FSes, heartbeat). Can I change the course of actions
>there by writing my own scripts?
>
>
>
>Thanks in advance to those brave enough to answer =)
>
>
>
>
>
>
>  
>




From jparsons at redhat.com  Wed May 17 13:19:14 2006
From: jparsons at redhat.com (James Parsons)
Date: Wed, 17 May 2006 09:19:14 -0400
Subject: [Linux-cluster] New APC agent
In-Reply-To: <1147803432.3384.25.camel@auh5-0479.corp.jabil.org>
References: <4447FA47.8090203@redhat.com>
	<1147803432.3384.25.camel@auh5-0479.corp.jabil.org>
Message-ID: <446B22D2.30800@redhat.com>

Hi Eric - thanks for testing and the patches.

-J

Eric Kerin wrote:

>On Thu, 2006-04-20 at 17:16 -0400, James Parsons wrote:
>  
>
>>Hello all,
>>
>>This is an snmp based fence agent for APC power switches to be used
>>with RHEL4 Red Hat Cluster Suite.
>>                                                                                
>>    
>>
><SNIP>
>  
>
>>Please let me know how this agent works.
>>
>>    
>>
>I finally got around to trying it out with my hardware, and wasn't able
>to get it working as a fence agent.  I tracked it down to two problems
>in the script when reading the commands from stdin.
>
>1. Never parses the port parameter
>2. Needs to trim the newline from each parameter read from stdin for the
>snmpset command to work properly.
>
>The attached patch cleared it up for me (please excuse any coding or
>style mistakes, first time coding in Python)  I've also included the
>relevant sections of my cluster.conf, just for completeness sake.
>
>Once I made these changes it worked quite well on my APC 7900's.  And
>now I can rename the outlets to descriptive names instead of "outlet
>1".  
>
>
>Now on to comments: 
>It would be nice to be able to set the community via the cluster.conf
>file, right now it's hard coded in the fence agent to private.
>
>
>Thanks, 
>Eric Kerin
>eric at bootseg.com
>
>
>  
>
>------------------------------------------------------------------------
>
>--- fence_apc_snmp      2006-04-20 16:28:18.000000000 -0400
>+++ fence_apc_snmp.new  2006-05-16 14:07:37.000000000 -0400
>@@ -117,7 +117,7 @@
>     #place params in dict
>     for line in sys.stdin:
>       val = line.split("=")
>-      params[val[0]] = val[1]
>+      params[val[0]] = val[1][:-1]
>
>     try:
>       address = params["ipaddr"]
>@@ -137,6 +137,12 @@
>       sys.exit(1)
>
>     try:
>+      port = params["port"]
>+    except KeyError, e:
>+      os.write(standard_err, "FENCE: Missing port param for fence_apc...exiting")
>+      sys.exit(1)
>+
>+    try:
>       a = params["option"]
>       if a == "Off" or a == "OFF" or a == "off":
>         action = POWER_OFF
>  
>
>------------------------------------------------------------------------
>
>                <clusternode name="auhjpsn01a" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="AUHAPC01a" port="1" option="off"/>
>                                        <device name="AUHAPC01b" port="1" option="off"/>
>                                        <device name="AUHAPC01a" port="1" option="on"/>
>                                        <device name="AUHAPC01b" port="1" option="on"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="auhjpsn01b" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="AUHAPC01a" port="2" option="off"/>
>                                        <device name="AUHAPC01b" port="2" option="off"/>
>                                        <device name="AUHAPC01a" port="2" option="on"/>
>                                        <device name="AUHAPC01b" port="2" option="on"/>
>                                </method>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>        <cman expected_votes="1" two_node="1"/>
>        <fencedevices>
>                <fencedevice agent="fence_apc_snmp" ipaddr="XXX.XXX.XXX.205" login="" name="AUHAPC01a" passwd=""/>
>                <fencedevice agent="fence_apc_snmp" ipaddr="XXX.XXX.XXX.206" login="" name="AUHAPC01b" passwd=""/>
>        </fencedevices>
>  
>




From iana at atlas.ua  Mon May 15 16:07:24 2006
From: iana at atlas.ua (Arsen Banduryan)
Date: Mon, 15 May 2006 19:07:24 +0300
Subject: [Linux-cluster] Cluster Setup Questions
Message-ID: <835488869.20060515190724@atlas.ua>

Hello Everyone.

I read the manual and the Knowledge Base, and the posts here and still
there is so much unclear that I have to ask for your support. I have a
complex  task,  so  prepare  for  a large post and complex setup. I am
mostly  new to clustering and I was given a fixed set of options, so I
wasn't able to choose hardware and software and must operate with what
I  was  given.  So,  below  are mine considerations on the "project" I
have. I ask everyone who met such setups before to help me.

The task & the environment:
===========================

I need to setup a highly-availabile cluster to run two apps (Task1 and
Task2)  that  also  communicate  tightly with each other (So I want an
"Active-Active"  setup).  I  want  to  have  as  much availability and
continuity  as  possible within my current environment (i.e. w/o bying
much  additional  hardware  and  largely  redesigning  the network). I
attached a scheme that illustrates the setup.

The show takes place in a large organization with two datacenters that
are   physically   separated  on  distant  sites  in  the  city  (like
catastrophe-proof  setup).  The  datacenters  are connected with Fiber
optic  running  a  number  of  protocols (so called 'Metropolitan Area
Network' - MAN). Among them there are SDH,  Gigabit Ethernet and Fible
Channel SAN.

In each datacenter there is
 * a SAN Storage (IBM DS8100, IBM ESS "Shark")
 * two SAN Switches (IBM TotalStorage SAN Switch 2109-F32)
 *  DB/App  Server  (IBM  IBM  eServer  xSeries  460 w 2 FC HBAs, each
 plugged  in  separate  SAN  Switch  and  multipathing,  running  RHEL
 4.2+RHCS)
 * Front-End Server (Win2K3 + IIS/ASP)
The SAN Storages have mirroring between them (can be turned on and off
for each partition).
 
Availability requirements:
==========================
A1)  I  want Active-Active configuration, where Node1 is running Task1
and  backing  up  Task2,  and Node2 is running Task2 and is backing up
Task1.
A2)  I  want  that  if  one  SAN Storage fails, the cluster could work
further without need to restart processes.
A3)  I  want  that  if  one  HBA  of SAN switch fails i could continue
working (currently solved with IBMsdd multipath driver)
A4)  If  server  HardDrive  fails  the  server should continue working
(solved by creating RAID 1 on servers' hot-swappable drives).
A5)  If one of two server's LAN links fails the server should continue
working  (want  to  resolve  it  by  trunking  VLANs  over  the bonded
interfaces,  but have doubts if it is possible - how does that bonding
work???).

Limitations:
============
L1) It's impossible to obtain Storage Virtualization Manager (due to a
lot  of bureaucratic problems) to make both SAN storages appear as one
to RHCS and solve requirement A3. :(((
L2)  I  am  limited  to RHEL 4u2, RHCS 4u2, because the soft is highly
sensitive to a lot of Linux bugs that appear and disappear from update
to  update..  When  I'll be able to get the cluster running on RHEL4.2
I'll make developers adapt it to QU3 or whatever it'll be.
L2.1) Also, where is no GFS.
L3)  The  staff  running  all  the  SAN,  actually, can only find that
something's  broken.  Those,  who implemented the entire large network
are long gone, and the current guys have no or little understanding of
what  their  boxes  can  do  and how ;( I'm not familiar with IBM SANs
either :( (though, I'm studying hard now).

What I did now:
===============
* attached all the hardware together.
* connected partitions on SAN storages.
* decided IP addresses.
* established two VLANs:
 + for servers' heartbeat
 + for public interfaces and common cluster IP.
* Installed RHEL 4.2, IBM sdd driver, RHCS

What has to be decided:
=======================
Currently, no fencing is decided and two options are considered:
 1. APC AP7921 Rack PDUs (still don't quite imagine how it'll work
 with servers with dual power supplies, but this can be solved)
 2. Integrated IBM Slimline RSA II controller (Don't know if it's
 supported by RHCS).

I  found  from  the  mailing  list, that there are problems with power
fencing   when  losing  LAN  connectivity  (no  heartbeat  =>  servers
powercycle each other). Haven't found a way to overcome it, though ;(

The  manual says that RHCS can work with single storage only. Is there
a  way  to  overcome it without storage virtualization? I really don't
want building software RAID arrays over MAN.

If  not  - I can establish SAN mirroring then, and mirror all the suff
from  "primary"  storage  to  backup  storage.  Is there a way to make
cluster   automatically   check   health   of   "active"  storage  and
automatically  remount  FS  from backup storage if active storage goes
down?  The only solution I found is to do this from customized service
status script..

If  so  -  can I run the cluster without common FS resource? My simple
tests show that that's possible, but want proofs.

The questions:
==============
Besides the questions asked above, the two main questions are:
 * How should I design the cluster so it works like I want?
 * How should I implement it?
 
Some smaller questions:
* If the status script returns 0 - status is ok. What happens if not?
Does the Cluster Software first try to restart the service or fails it
over to the next node? If tries to restart - by which means ('service
XXX restart', kill -9, or smth else)?

*  How  does  cluster  software  check  the  status  of infrastructure
resources  (IPs,  FSes, heartbeat). Can I change the course of actions
there by writing my own scripts?



Thanks in advance to those brave enough to answer =)

-- 
Regards
 Arsen Banduryan                        mailto:iana at atlas.ua
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster-post.jpg
Type: image/jpeg
Size: 93099 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060515/d756dd21/attachment.jpg>

From nattaponv at hotmail.com  Wed May 17 18:14:10 2006
From: nattaponv at hotmail.com (nattapon viroonsri)
Date: Wed, 17 May 2006 18:14:10 +0000
Subject: [Linux-cluster] lvreduce Gfs volumn 
Message-ID: <BAY22-F115C3A8DF65822BA39749FA6A10@phx.gbl>

rhel4.0, rhcs4.0 , gfs6.1
I try to reduce size of  logical volume contain gfs file system with 
lvreduce command and have no any error.
Check with vgdisplay free extent has increased as expect and output of 
lvdisplay show that size of logical volumn
has decrease correctly.

But when i issue df command show that size of gfs file system still have the 
same size before i use lvreduce.
Do i have to do something before issue lvreduce ?

I have found only gfs_grow to expand file system.
Have any tool to shrink gfs file system before lvreduce ?

Nattapon,
Regards

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



From ookami at gmx.de  Wed May 17 20:06:22 2006
From: ookami at gmx.de (Wolfgang Pauli)
Date: Wed, 17 May 2006 14:06:22 -0600
Subject: [Linux-cluster] multicast
Message-ID: <200605171406.22148.ookami@gmx.de>

Hi,

So to use gfs+dlm on a host with two interfaces (each on a different subnet) I 
probably have to use multicasting as dlm does not support multihome setup. I 
guess I have to setup up this one hosts with the two interfaces as a 
forwarder ??? Any hints where I can find docs about this? I found this nice 
howto below, but am not sure whether I have to use ipv4 or ipv6 multicast. 
This might be a strange question, but I am serious ...

http://www.ibiblio.org/pub/Linux/docs/HOWTO/other-formats/html_single/Linux+IPv6-HOWTO.html

Wolfgang



From jbrassow at redhat.com  Thu May 18 04:28:18 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Wed, 17 May 2006 23:28:18 -0500
Subject: [Linux-cluster] lvreduce Gfs volumn 
In-Reply-To: <BAY22-F115C3A8DF65822BA39749FA6A10@phx.gbl>
References: <BAY22-F115C3A8DF65822BA39749FA6A10@phx.gbl>
Message-ID: <719ee95e99013d2ef3f8385e46e7240d@redhat.com>

GFS does not currently have the ability to shrink.  Therefore, you can 
not reduce the size of your volume.

  brassow

On May 17, 2006, at 1:14 PM, nattapon viroonsri wrote:

> rhel4.0, rhcs4.0 , gfs6.1
> I try to reduce size of  logical volume contain gfs file system with 
> lvreduce command and have no any error.
> Check with vgdisplay free extent has increased as expect and output of 
> lvdisplay show that size of logical volumn
> has decrease correctly.
>
> But when i issue df command show that size of gfs file system still 
> have the same size before i use lvreduce.
> Do i have to do something before issue lvreduce ?
>
> I have found only gfs_grow to expand file system.
> Have any tool to shrink gfs file system before lvreduce ?
>
> Nattapon,
> Regards
>
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today it's 
> FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pcaulfie at redhat.com  Thu May 18 07:25:38 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 18 May 2006 08:25:38 +0100
Subject: [Linux-cluster] multicast
In-Reply-To: <200605171406.22148.ookami@gmx.de>
References: <200605171406.22148.ookami@gmx.de>
Message-ID: <446C2172.4070208@redhat.com>

Wolfgang Pauli wrote:
> Hi,
> 
> So to use gfs+dlm on a host with two interfaces (each on a different subnet) I 
> probably have to use multicasting as dlm does not support multihome setup. I 
> guess I have to setup up this one hosts with the two interfaces as a 
> forwarder ??? Any hints where I can find docs about this? I found this nice 
> howto below, but am not sure whether I have to use ipv4 or ipv6 multicast. 
> This might be a strange question, but I am serious ...

With the two interfaces on different subnets, you might be able to get it to
seem to work, but you won't get failover if one interface fails. Specifying
multicast only affects the way cman communicates - DLM uses TCP connections
based on the address passed down by cman. IP4/6 - it makes no difference.

In short: if you need interface failover with cman/dlm, you need to use
bonding - with all that that implies.

-- 

patrick



From pauli at grey.colorado.edu  Thu May 18 18:46:56 2006
From: pauli at grey.colorado.edu (Wolfgang Pauli)
Date: Thu, 18 May 2006 12:46:56 -0600
Subject: [Linux-cluster] multicast
In-Reply-To: <446C2172.4070208@redhat.com>
References: <200605171406.22148.ookami@gmx.de> <446C2172.4070208@redhat.com>
Message-ID: <200605181246.57385.pauli@grey.colorado.edu>

We just decided to keep using NFS with XFS. I had the bonding before, but than 
the guys on the first subnet could not connect to the headnode anymore, 
because with the bonding the headnode only had one ip addy (which is the one 
of the other subnet, or vice versa) and I could not figure out how to change 
the route to tell those nodes that their gateway is on a different subnet (or 
whatever the problem was). And than I would probably also have to use 
multicast because the head node would be on a different subnet that those 
nodes. I just don't know enough about all that, because I am only a student 
and not a real admin.

wolfgang



From celso at webbertek.com.br  Thu May 18 18:45:13 2006
From: celso at webbertek.com.br (Celso K. Webber)
Date: Thu, 18 May 2006 15:45:13 -0300
Subject: [Linux-cluster] Link monitor under Red Hat Cluster Suite v3
Message-ID: <446CC0B9.6060204@webbertek.com.br>

Hello all,

I've been supporting a customer that runs Red Hat Cluster Suite v3.

On this opportunity, we have fully up2date all packages, including those 
from RHEL and RHCS. So we have the latest kernel with latest clumanasger 
packages.

I've noticed that the "link monitoring" feature is perhaps not working 
quite well. The servers have eth0 and eth1 bonded to bond0, and 
heartbeat is conducted through eth2.

We've tried to pull out both cables from eth0 and eth1, thus rendering 
the machine unaccessible from the corporate network, but still 
accessible from the other machine via the crossover cable on eth2.

Since we have the "monitor link" check box "on" for the virtual IP, I 
imagined that the clumanager would notice the link absense and thus 
migrate the service to the other node.

Unfortunately, the logs showed nothing and the system went running as 
they were before unpluging the cables.

Is this behaviour correct? Shouldn't clumanager notice that the cables 
went down?

I appreciate any tips on this. BTW, the relevant part of cluster.xml is 
here:
     <service checkinterval="0" failoverdomain="srvkrm" id="0" 
maxfalsestarts="5" maxrestarts="10" name="srvkrm" 
userscript="/root/bin/Start/StartDBKRM">
       <service_ipaddresses>
         <service_ipaddress broadcast="10.0.7.255" id="0" 
ipaddress="10.0.4.184" monitor_link="1" netmask="255.255.252.0"/>
       </service_ipaddresses>
       <device id="0" name="/dev/emcpoweri1" sharename="">
         <mount forceunmount="yes" fstype="ext3" mountpoint="/ext_krm" 
options=""/>
       </device>
     </service>

Thank you in advance for any help.

Regards,

Celso.
-- 
*Celso Kopp Webber*

celso at webbertek.com.br <mailto:celso at webbertek.com.br>

*Webbertek - Opensource Knowledge*
(41) 8813-1919
(41) 3284-3035



From Matthew.Patton.ctr at osd.mil  Thu May 18 22:49:54 2006
From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E)
Date: Thu, 18 May 2006 18:49:54 -0400
Subject: [Linux-cluster] Link monitor under Red Hat Cluster Suite v3
Message-ID: <D8063DF686D10247B0A49D01271285690CE91DEF@osdn06.osd.mil>

Classification: UNCLASSIFIED

> Since we have the "monitor link" check box "on" for the virtual IP, I 
> imagined that the clumanager would notice the link absense and thus 
> migrate the service to the other node.

I doubt your (or the default) configures cluster services to respond to
/etc/sysconfig/network-functions:do_netreport(). I think this is a perfect
application but have no idea if the function even works or if the cluster
daemons respond to a SIGIO in a useful manner. I also don't see how losing
carrier is going to trigger this function on it's own. Does clumanager
really watch link status? In all liklihood you're still going to have to
come up with a script etc. that monitors interfaces and when it notices a
link going down, invokes the alert mechanism.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060518/ca10e6ce/attachment.htm>

From keithl at mukluk.its.monash.edu.au  Fri May 19 01:26:51 2006
From: keithl at mukluk.its.monash.edu.au (Keith Lewis)
Date: Fri, 19 May 2006 11:26:51 +1000
Subject: [Linux-cluster] GFS lock held for 12 hours
Message-ID: <200605190126.k4J1QppI028664@mukluk.its.monash.edu.au>


We have a GFS cluster with 12 data nodes and 3 lock-servers.

Red Hat AS3 U7 GFS-6.0.2.30-0

The data nodes all access a SAN disk.

The SAN fabric is divided into two independent halves - called Red and Blue.
Half the data nodes on each.  The data nodes access only one disk - reachable
via either SAN.

There are other clients, other clusters and other disks sharing the SAN.

Recently a faulty HBA was plugged into a machine, not part of our cluster, and
connected to the Red SAN.

At this point the Red SAN failed, there were two main moderately immediate
results:

One of the Red SAN nodes became very busy.  Presumably it was holding a fairly
big GFS lock at the time.  But it continued to hold the lock and to send
heartbeats.  The node gave the appearance of being hung.

The rest of the Red SAN nodes, over a period of a few minutes, all presumably
did some IO to the disk and presumably got into a busy wait state, which was
so tight that they stopped sending heartbeats, and got fenced.  (APC PDU's)

On reboot these nodes could see the SAN as normal except they could not see
their SAN disk.  Nor could they see another disk added to the SAN as part of
the debugging attempted later.  

Many attempts were made to make the disk reappear, mostly by rebooting or
shutting down GFS and rmmod-ing qla2300 and modprobe-ing qla2300.  Everything
was quite normal, except the Red SAN would not let any of our nodes see our
disk.

On the Blue SAN all the machines became very busy.  Presumably because of the
one Red SAN machine holding the lock.  These nodes were also thought to be
hung, but none of them were rebooted as it was discovered that they were still
exporting an important Web tree that was not on GFS disk.  (They sprang back
to life when the one - lock holding - Red SAN machine was rebooted - which was
well after the Red SAN problem was fixed).

This state of affairs lasted 12 hours.  

Fixing it was made difficult because to anyone looking at the problem it
appeared the entire SAN and the entire cluster was down.  Very little that we
saw at the time indicated that only the Red SAN had failed.  (Hindsight is
wonderful).

This was particularly unfortunate.  The justification for installing GFS was 
resilience in the face of hardware failure.  (esp no spof).

So finally here are my questions:  

Is it really reasonable for a machine to hang onto a lock for 12 hours ?

Would it be possible for a GFS machine to detect that it cannot do IO to its
GFS disk any more and release any locks it holds - perhaps by fencing itself?

(I'm thinking of adding a cronjob that forks a subprocess that does an IO to
the GFS disk.  The parent could shutdown the node, leading to a fence, if the
child takes more than a minute).

Have I made any mistakes in my guesses and presumptions ?

Keith Lewis



From carlopmart at gmail.com  Fri May 19 07:27:08 2006
From: carlopmart at gmail.com (carlopmart)
Date: Fri, 19 May 2006 09:27:08 +0200
Subject: [Linux-cluster] Postgresql under RHCS4
In-Reply-To: <Pine.LNX.4.63.0605111111550.22324@mail.kivi.com.tr>
References: <4462F013.40201@gmail.com>
	<Pine.LNX.4.63.0605111111550.22324@mail.kivi.com.tr>
Message-ID: <446D734C.60308@gmail.com>

Thanks Devrim. I want to run an active/passive cluster. Which can be the 
  best form without using GFS ??


Devrim GUNDUZ wrote:
> 
> Hi,
> 
> On Thu, 11 May 2006, carlopmart wrote:
> 
>> Somebody have tried to setup two nodes with Postgresql under RHCS4?. 
>> Is it possible to do this without shared storage like mysql cluster 
>> feature does?
> 
> If you want to run an active/passive cluster, then go on with 
> RHCS+ext{2,3}(or GFS). PostgreSQL cannot run on active/active cluster 
> systems, natively. However, you might give PgCluster a try. Even though 
> it is not the best way, it is worth trying. Slony-II will be 
> implementing multimaster replication feature, but it is still under 
> development.
> 
> BTW, GFS is not a prerequisite, if you run an active/passive cluster; 
> however I used it, in order to prevent a spof.
> 
> Regards,
> -- 
> Devrim GUNDUZ
> devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
>                       http://www.gunduz.org
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From Marco.Esposito at roma1.infn.it  Fri May 19 08:06:43 2006
From: Marco.Esposito at roma1.infn.it (Marco Esposito)
Date: Fri, 19 May 2006 10:06:43 +0200 (ora solare Europa occidentale)
Subject: [Linux-cluster] clustat
Message-ID: <Pine.WNT.4.64.0605190952210.1612@manchester>


hi, i have a question about my cluster...

using "clustat -x" I receive this message:

<?xml version="1.0"?>
<clustat version="4.0">
   <quorum quorate="1" groupmember="1"/>
   <nodes>
     <node name="node1.clusterlan" state="0" nodeid="0x0000000000000002"/>
     <node name="node2.clusterlan" state="1" nodeid="0x0000000000000001"/>
   </nodes>
   <groups>
     <group name="ftp-service" state="112" state_str="started" 
owner="(null)" last_owner="(null)" restarts="2"/>
   </groups>
</clustat>


Why i get "owner" and "last_owner" equal to "(null)"???
while using simply clustat:

Member Status: Quorate

   Member Name                              Status
   ------ ----                              ------
   node1.clusterlan                         Online, Local, rgmanager
   node2.clusterlan                         Online, rgmanager

   Service Name         Owner (Last)                   State
   ------- ----         ----- ------                   -----
   ftp-service          node1.clusterlan               started

what is the problem?

Thanks




From mels.kooijman at Multrix.com  Fri May 19 08:24:15 2006
From: mels.kooijman at Multrix.com (Mels Kooijman)
Date: Fri, 19 May 2006 10:24:15 +0200
Subject: [Linux-cluster] Clurgmgrd error
Message-ID: <91DF818A298B6543BF5F1C8DC6A5418714E263@ASPAMS2007.asp.multrix.local>


Hi,

We are runnen rh es 4 64 bit whith cluster.
Yesterday we had a failure with the next messages:

unxams4082:

May 17 23:21:50 unxams4082 clurgmgrd[23408]: <notice> status on ip
"192.168.50.43" returned 1 (generic error)
May 17 23:21:50 unxams4082 clurgmgrd[23408]: <notice> Stopping service
infoserver
May 17 23:21:50 unxams4082 error: starting appender syslogd, cause is
failover stop
May 17 23:21:50 unxams4082 su(pam_unix)[12905]: session opened for user
tomtom by (uid=0)
May 17 23:21:51 unxams4082 su(pam_unix)[12905]: session closed for user
tomtom
May 17 23:21:51 unxams4082 error: starting appender syslogd, cause is
failover stop
May 17 23:21:51 unxams4082 clurgmgrd[23408]: <notice> Service infoserver
is recovering
May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #55: Failed changing
RG status
May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #44: Cannot start RG
infoserver: Invalid State 117
May 17 23:21:51 unxams4082 clurgmgrd[23408]: <crit> #13: Service
infoserver failed to stop cleanly
May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #57: Failed changing
RG status

unxaal4082:

May 17 23:21:51 unxaal4082 clurgmgrd[19476]: <info> Magma Event:
Membership Change
May 17 23:21:51 unxaal4082 clurgmgrd[19476]: <info> State change:
unxams4082 DOWN
May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <err> #44: Cannot start RG
infoserver: Invalid State 117
May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <crit> #13: Service
infoserver failed to stop cleanly
May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <notice> Taking over
service clustat from down member (null)
May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <notice> Service clustat
started

Where can I find a description of the error numbers (55,44,13,57)?
There are 2 script started bij the cluster, clustat (for test) and
infoserver, when the scripts are called with the status option it gives
only a exit 0.
What can be the course that we get often the message:
clurgmgrd[23408]: <notice> status on ip "192.168.50.43" returned 1
(generic error)

Regards
Mels




From cjk at techma.com  Fri May 19 11:07:13 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Fri, 19 May 2006 07:07:13 -0400
Subject: [Linux-cluster] GFS lock held for 12 hours
In-Reply-To: <200605190126.k4J1QppI028664@mukluk.its.monash.edu.au>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E9B@tmaemail.techma.com>

Keith, this sounds like what might happen if the SAN fabric is 
not properly configured. The Qlogic cards will send a scsi reset
down the bus when activated. If your fabric is open to the point
where nodes can see each other , then the other nodes will receive
the scsi reset and there HBA's will go through a LIP reset. This
is not good, especially when you have more than a few machine in 
a cluster. The result is that GFS cannot see the disks for a while
and in most cases that would be for too long and nodes end up getting
fenced. 

If this is indeed the case, I'd suggest making separate zones for each
HBA<-->Storage device combination. For instance, a cluster with 3 nodes
and two HBA's each would end up having 6 zones. Each zone conisting if
an HBA and the storage it needs to access. They'll all be the same 
logically but they'll each be isolated from one another.

Also, there is a problem with the RHEL3 based GFS in that it doesn't
seem to play nice with the system with respect to lock space and memory.
GFS will in fact hog all the memory it can (for performance reasons)
to a point where the system itself cannot fork any processes. The way
around this is (in U7) manage the inoded_purge parameter for each mounted
GFS filesystem. inoded_purge is a percentage of locks held by GFS to try
and purge thereby releasing that mamory back to the system. It appears 
even if you cannot fork, lock_gulmd can still respond to the other nodes
indicating all is well when in fact it is not. The developers can surely
correct me if I am wrong but that's the way it acts. Seems to me by making
the response be a separate thread, this could be avoided since then
lock_gulmd
wold not be able to respond to the cluster heartbeat subssytem and it would 
get fenced. I'm sure there is more to it than that tho.

If I set my inoded_purge numbers to 20 and fire up an rsync I stay right
around 30,000 locks. My ststems (with 3GB ram) would get into a non-forking
state at around 380k locks.


Hope this helps.



Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Keith Lewis
Sent: Thursday, May 18, 2006 9:27 PM
To: Linux-cluster at redhat.com
Subject: [Linux-cluster] GFS lock held for 12 hours


We have a GFS cluster with 12 data nodes and 3 lock-servers.

Red Hat AS3 U7 GFS-6.0.2.30-0

The data nodes all access a SAN disk.

The SAN fabric is divided into two independent halves - called Red and Blue.
Half the data nodes on each.  The data nodes access only one disk - reachable
via either SAN.

There are other clients, other clusters and other disks sharing the SAN.

Recently a faulty HBA was plugged into a machine, not part of our cluster,
and connected to the Red SAN.

At this point the Red SAN failed, there were two main moderately immediate
results:

One of the Red SAN nodes became very busy.  Presumably it was holding a
fairly big GFS lock at the time.  But it continued to hold the lock and to
send heartbeats.  The node gave the appearance of being hung.

The rest of the Red SAN nodes, over a period of a few minutes, all presumably
did some IO to the disk and presumably got into a busy wait state, which was
so tight that they stopped sending heartbeats, and got fenced.  (APC PDU's)

On reboot these nodes could see the SAN as normal except they could not see
their SAN disk.  Nor could they see another disk added to the SAN as part of
the debugging attempted later.  

Many attempts were made to make the disk reappear, mostly by rebooting or
shutting down GFS and rmmod-ing qla2300 and modprobe-ing qla2300.  Everything
was quite normal, except the Red SAN would not let any of our nodes see our
disk.

On the Blue SAN all the machines became very busy.  Presumably because of the
one Red SAN machine holding the lock.  These nodes were also thought to be
hung, but none of them were rebooted as it was discovered that they were
still exporting an important Web tree that was not on GFS disk.  (They sprang
back to life when the one - lock holding - Red SAN machine was rebooted -
which was well after the Red SAN problem was fixed).

This state of affairs lasted 12 hours.  

Fixing it was made difficult because to anyone looking at the problem it
appeared the entire SAN and the entire cluster was down.  Very little that we
saw at the time indicated that only the Red SAN had failed.  (Hindsight is
wonderful).

This was particularly unfortunate.  The justification for installing GFS was
resilience in the face of hardware failure.  (esp no spof).

So finally here are my questions:  

Is it really reasonable for a machine to hang onto a lock for 12 hours ?

Would it be possible for a GFS machine to detect that it cannot do IO to its
GFS disk any more and release any locks it holds - perhaps by fencing itself?

(I'm thinking of adding a cronjob that forks a subprocess that does an IO to
the GFS disk.  The parent could shutdown the node, leading to a fence, if the
child takes more than a minute).

Have I made any mistakes in my guesses and presumptions ?

Keith Lewis

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From lvermeiren at core-it.be  Fri May 19 11:45:16 2006
From: lvermeiren at core-it.be (Lode Vermeiren)
Date: Fri, 19 May 2006 13:45:16 +0200
Subject: [Linux-cluster] bladecenter fencing problem
Message-ID: <1148039116.2425.16.camel@lode.coreit.be>

Hi all,

we are trying to set up a cluster between two IBM BladeCenter servers.
I have read the documentation numerous times, and searched the list
archives, but I have not yet found a clear explanation of all the
concepts of the clustering solution.

At the moment it almost works.. except the most important part, the
actual takeover.

What works:

* nodes A and B are online
* service is running on node A
* cluster services on node A are stopped (eg. clean shutdown of node A)
* service is restarted on node B

What doesn't work:
* nodes A and B are online
* service is running on node A
* node A fails (eg. powercut)
* service is not restarted on node B.. Node B tries to fence node A, but
this stalls. The service hangs at "stopping" in system-config-cluster.

This seems to be the same problem as Theodorus (theo at tkd.co.id) was
experiencing (see post <TKDNET9mK6PYzDabigP000000e6 at tkdnet.tkd.co.id> on
March 21 2006), but I haven't found follow-ups to that message.

Our environment:
2 x Xeon Blade, RHEL 4.3 x86_64, Bladecenter fencing

Any pointers to a solution or to what I'm doing wrong are greatly
appreciated!

Best,
Lode



From lhh at redhat.com  Fri May 19 14:51:52 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 19 May 2006 10:51:52 -0400
Subject: [Linux-cluster] clustat
In-Reply-To: <Pine.WNT.4.64.0605190952210.1612@manchester>
References: <Pine.WNT.4.64.0605190952210.1612@manchester>
Message-ID: <1148050312.31561.30.camel@ayanami.boston.redhat.com>

On Fri, 2006-05-19 at 10:06 +0200, Marco Esposito wrote:
> hi, i have a question about my cluster...
> 
> using "clustat -x" I receive this message:
> 
> <?xml version="1.0"?>
> <clustat version="4.0">
>    <quorum quorate="1" groupmember="1"/>
>    <nodes>
>      <node name="node1.clusterlan" state="0" nodeid="0x0000000000000002"/>
>      <node name="node2.clusterlan" state="1" nodeid="0x0000000000000001"/>
>    </nodes>
>    <groups>
>      <group name="ftp-service" state="112" state_str="started" 
> owner="(null)" last_owner="(null)" restarts="2"/>
>    </groups>
> </clustat>
> 
> 
> Why i get "owner" and "last_owner" equal to "(null)"???
> while using simply clustat:
> 
> Member Status: Quorate
> 
>    Member Name                              Status
>    ------ ----                              ------
>    node1.clusterlan                         Online, Local, rgmanager
>    node2.clusterlan                         Online, rgmanager
> 
>    Service Name         Owner (Last)                   State
>    ------- ----         ----- ------                   -----
>    ftp-service          node1.clusterlan               started
> 
> what is the problem?
> 
> Thanks
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


Outdated version of clustat...

-- Lon



From Marco.Esposito at roma1.infn.it  Fri May 19 14:56:32 2006
From: Marco.Esposito at roma1.infn.it (Marco Esposito)
Date: Fri, 19 May 2006 16:56:32 +0200 (CEST)
Subject: [Linux-cluster] clustat
In-Reply-To: <1148050312.31561.30.camel@ayanami.boston.redhat.com>
References: <Pine.WNT.4.64.0605190952210.1612@manchester>
	<1148050312.31561.30.camel@ayanami.boston.redhat.com>
Message-ID: <Pine.LNX.4.62.0605191655580.3624@manchester.roma1.infn.it>



clustat -v


clustat version 1.9.39
Connected via: CMAN/SM Plugin v1.1.2




On Fri, 19 May 2006, Lon Hohberger wrote:

> On Fri, 2006-05-19 at 10:06 +0200, Marco Esposito wrote:
>> hi, i have a question about my cluster...
>>
>> using "clustat -x" I receive this message:
>>
>> <?xml version="1.0"?>
>> <clustat version="4.0">
>>    <quorum quorate="1" groupmember="1"/>
>>    <nodes>
>>      <node name="node1.clusterlan" state="0" nodeid="0x0000000000000002"/>
>>      <node name="node2.clusterlan" state="1" nodeid="0x0000000000000001"/>
>>    </nodes>
>>    <groups>
>>      <group name="ftp-service" state="112" state_str="started"
>> owner="(null)" last_owner="(null)" restarts="2"/>
>>    </groups>
>> </clustat>
>>
>>
>> Why i get "owner" and "last_owner" equal to "(null)"???
>> while using simply clustat:
>>
>> Member Status: Quorate
>>
>>    Member Name                              Status
>>    ------ ----                              ------
>>    node1.clusterlan                         Online, Local, rgmanager
>>    node2.clusterlan                         Online, rgmanager
>>
>>    Service Name         Owner (Last)                   State
>>    ------- ----         ----- ------                   -----
>>    ftp-service          node1.clusterlan               started
>>
>> what is the problem?
>>
>> Thanks
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> Outdated version of clustat...
>
> -- Lon
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>



From lhh at redhat.com  Fri May 19 15:00:12 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 19 May 2006 11:00:12 -0400
Subject: [Linux-cluster] Clurgmgrd error
In-Reply-To: <91DF818A298B6543BF5F1C8DC6A5418714E263@ASPAMS2007.asp.multrix.local>
References: <91DF818A298B6543BF5F1C8DC6A5418714E263@ASPAMS2007.asp.multrix.local>
Message-ID: <1148050812.31561.39.camel@ayanami.boston.redhat.com>

On Fri, 2006-05-19 at 10:24 +0200, Mels Kooijman wrote:

> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <notice> Service infoserver
> is recovering
> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #55: Failed changing
> RG status
> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #44: Cannot start RG
> infoserver: Invalid State 117
> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <crit> #13: Service
> infoserver failed to stop cleanly
> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #57: Failed changing
> RG status

What caused unxams4082 to die?  Was there anything in dmesg, like
dlm_emergency_shutdown?


> May 17 23:21:51 unxaal4082 clurgmgrd[19476]: <info> Magma Event:
> Membership Change
> May 17 23:21:51 unxaal4082 clurgmgrd[19476]: <info> State change:
> unxams4082 DOWN
> May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <err> #44: Cannot start RG
> infoserver: Invalid State 117

What release of rgmanager?  This might be a bug.

> May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <crit> #13: Service
> infoserver failed to stop cleanly
> May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <notice> Taking over
> service clustat from down member (null)
> May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <notice> Service clustat
> started
> 
> Where can I find a description of the error numbers (55,44,13,57)?

/usr/share/doc/rgmanager-*/errors.txt


> What can be the course that we get often the message:
> clurgmgrd[23408]: <notice> status on ip "192.168.50.43" returned 1
> (generic error)

Can be caused by several things... link died, pre-U3 router ping failed.
The router-ping code was removed in U3 because it caused more problems
than it solved.  If this is happening every two minutes, this is
definitely the cause.

-- Lon



From lhh at redhat.com  Fri May 19 15:03:45 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 19 May 2006 11:03:45 -0400
Subject: [Linux-cluster] bladecenter fencing problem
In-Reply-To: <1148039116.2425.16.camel@lode.coreit.be>
References: <1148039116.2425.16.camel@lode.coreit.be>
Message-ID: <1148051025.31561.43.camel@ayanami.boston.redhat.com>

On Fri, 2006-05-19 at 13:45 +0200, Lode Vermeiren wrote:

> At the moment it almost works.. except the most important part, the
> actual takeover.

Looks like it would succeed if fencing completed...

> What doesn't work:
> * nodes A and B are online
> * service is running on node A
> * node A fails (eg. powercut)
> * service is not restarted on node B.. Node B tries to fence node A, but
> this stalls. The service hangs at "stopping" in system-config-cluster.
> 
> This seems to be the same problem as Theodorus (theo at tkd.co.id) was
> experiencing (see post <TKDNET9mK6PYzDabigP000000e6 at tkdnet.tkd.co.id> on
> March 21 2006), but I haven't found follow-ups to that message.

Hrm, can we see the fence + node sections of cluster.conf?  Also, can
you tell me what happens if you run the following:

  fence_node <nodename>

Thanks!

-- Lon



From devrim at gunduz.org  Fri May 19 22:23:36 2006
From: devrim at gunduz.org (Devrim GUNDUZ)
Date: Sat, 20 May 2006 01:23:36 +0300 (EEST)
Subject: [Linux-cluster] Postgresql under RHCS4
In-Reply-To: <446D734C.60308@gmail.com>
References: <4462F013.40201@gmail.com>
	<Pine.LNX.4.63.0605111111550.22324@mail.kivi.com.tr>
	<446D734C.60308@gmail.com>
Message-ID: <Pine.LNX.4.63.0605191129300.7386@mail.kivi.com.tr>


Hi,

On Fri, 19 May 2006, carlopmart wrote:

> Thanks Devrim. I want to run an active/passive cluster. Which can be the 
> best form without using GFS ??

You mean filesystem? Then ext{2,3} is the alternative, depending on how 
much you trust your hardware, UPS, kernel, etc.

I'd not use reiserfs for this.

Regards,
--
Devrim GUNDUZ
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
                       http://www.gunduz.org



From nattaponv at hotmail.com  Sun May 21 17:55:04 2006
From: nattaponv at hotmail.com (nattapon viroonsri)
Date: Sun, 21 May 2006 17:55:04 +0000
Subject: [Linux-cluster] Clvm on imported Gnbd Problem
Message-ID: <BAY22-F4F2F00334C03B23253879A6A50@phx.gbl>

rhel4.0,rhcs4 gfs6.1 , gnbd1.0, piranha-0.8.0-1
Load balance Structure:
director: director1
2 node(real server): cluster1, cluster2
gnbd server: gnbd_serv

Gnbd server have no cluster software enable.
I have export disk as block deives on gnbd server with following step
#gnbd_serv -n
# gnbd_export -c -e slice0 -d /dev/sdb

then import device on both real server node(cluster1,cluster2) completely
#gnbd_impoart -i gnbd_serv

and start all cluster service successfull( ccsd, cman,fence, clvm, gfs).
I builded  lvm at cluster1 (first real server node)  on imported device from 
gnbd server
and have no any error and can format device with gfs.
But when i list lvm on another node (cluster2)  with pvs and lvs command It 
have not found lvm i have created on cluster1.
cluster2 only see the original block device that import from gnbd_serv.

Clvm have start successfully but look like both node cannot sync status of 
lvm.
Do i miss some thing before create lvm on imported block device  ?

Nattapon,
Regard

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



From jason at monsterjam.org  Sun May 21 21:43:52 2006
From: jason at monsterjam.org (Jason)
Date: Sun, 21 May 2006 17:43:52 -0400
Subject: [Linux-cluster] building GFS-6.1 from SRPMS
Message-ID: <20060521214352.GA84377@monsterjam.org>

anyone know if theres a howto for this?

Im trying to build from source on my 
Red Hat Enterprise Linux AS release 4 (Nahant Update 3)
box and get 

[root at tf1 ~]# rpmbuild -bs /usr/src/redhat/SPECS/GFS.spec 
error: Failed build dependencies:
        GFS-kernheaders >= 2.6.9 is needed by GFS-6.1.5-0.i386
[root at tf1 ~]# 

where does GFS-kernheaders come from? cant seem to find it.

regards,
Jason



From wcheng at redhat.com  Mon May 22 00:11:23 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Sun, 21 May 2006 20:11:23 -0400
Subject: [Linux-cluster] building GFS-6.1 from SRPMS
In-Reply-To: <20060521214352.GA84377@monsterjam.org>
References: <20060521214352.GA84377@monsterjam.org>
Message-ID: <1148256683.3873.47.camel@localhost.localdomain>

On Sun, 2006-05-21 at 17:43 -0400, Jason wrote:
> anyone know if theres a howto for this?
> 
> Im trying to build from source on my 
> Red Hat Enterprise Linux AS release 4 (Nahant Update 3)
> box and get 
> 
> [root at tf1 ~]# rpmbuild -bs /usr/src/redhat/SPECS/GFS.spec 
> error: Failed build dependencies:
>         GFS-kernheaders >= 2.6.9 is needed by GFS-6.1.5-0.i386
> [root at tf1 ~]# 
> 
> where does GFS-kernheaders come from? cant seem to find it.
> 

Get GFS-kernheaders-2.6.9-49.1.i686.rpm from i386/RedHat/RPMS directory
(instead of i386/SRPMS) and install it. I believe you need this RPM
installed to get your build going.

-- Wendy

   



From jason at monsterjam.org  Mon May 22 00:51:50 2006
From: jason at monsterjam.org (Jason)
Date: Sun, 21 May 2006 20:51:50 -0400
Subject: [Linux-cluster] building GFS-6.1 from SRPMS
In-Reply-To: <1148256683.3873.47.camel@localhost.localdomain>
References: <20060521214352.GA84377@monsterjam.org>
	<1148256683.3873.47.camel@localhost.localdomain>
Message-ID: <20060522005150.GA35710@monsterjam.org>

Thanks for the response Wendy, but, I couldnt find 
an RPMS dir anywhere under 

ftp.redhat.com:/pub/redhat/linux/enterprise/4/en/RHCS/i386/SRPMS
ftp.redhat.com:/pub/redhat/linux/enterprise/4/en/RHGFS/i386/SRPMS
ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4ES/en/RHGFS/SRPMS/

but I did find..
http://www.redhat.com/archives/linux-cluster/2005-October/msg00076.html

and that got me farther.. right now, im stuck at 

[root at tf1 rpms]# sudo rpmbuild -bb /usr/src/redhat/SPECS/GFS-kernel.spec
error: Architecture is not included: i386
[root at tf1 rpms]# sudo rpmbuild -bb --target=i686 /usr/src/redhat/SPECS/GFS-kernel.spec
Building target platforms: i686
Building for target i686
error: Failed build dependencies:
        dlm-devel is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernheaders is needed by GFS-kernel-2.6.9-49.1.i686
        kernel-devel = 2.6.9-34.EL is needed by GFS-kernel-2.6.9-49.1.i686
        kernel-smp-devel = 2.6.9-34.EL is needed by GFS-kernel-2.6.9-49.1.i686
        kernel-hugemem-devel = 2.6.9-34.EL is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernel is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernel-smp is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernel-hugemem is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernheaders >= 2.6.9 is needed by GFS-kernel-2.6.9-49.1.i686
        cman-kernheaders >= 2.6.9 is needed by GFS-kernel-2.6.9-49.1.i686
[root at tf1 rpms]# 

so i did 
[root at tf1 SPECS]# sudo rpmbuild -bb  --target=i686 cman-kernel.spec 
Building target platforms: i686
Building for target i686
error: Failed build dependencies:
        kernel-smp = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        kernel-hugemem = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        kernel-devel = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        kernel-smp-devel = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        kernel-hugemem-devel = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        fake-build-provides is needed by cman-kernel-2.6.9-36.0.i686
[root at tf1 SPECS]# 

so I guess I need to edit the spec file and set the extraversion to 
2.6.9-34

and I cant seem to find where to find a redhat version of fake-build-provides..
I have to use the one from CENTOS ? ;)
http://rpm.karan.org/el4/csgfs/SRPMS/fake-build-provides-1.0-20.src.rpm

regards,
Jason



From carlopmart at gmail.com  Mon May 22 07:25:25 2006
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 22 May 2006 09:25:25 +0200
Subject: [Linux-cluster] Postgresql under RHCS4
In-Reply-To: <Pine.LNX.4.63.0605191129300.7386@mail.kivi.com.tr>
References: <4462F013.40201@gmail.com>	<Pine.LNX.4.63.0605111111550.22324@mail.kivi.com.tr>	<446D734C.60308@gmail.com>
	<Pine.LNX.4.63.0605191129300.7386@mail.kivi.com.tr>
Message-ID: <44716765.1000000@gmail.com>

No Devrim, I mean which can be the best form to setup replication 
between master and slave ... and when master goes down, if I put data on 
slave how can I update master node.

many thanks Devrim.

Devrim GUNDUZ wrote:
> 
> Hi,
> 
> On Fri, 19 May 2006, carlopmart wrote:
> 
>> Thanks Devrim. I want to run an active/passive cluster. Which can be 
>> the best form without using GFS ??
> 
> You mean filesystem? Then ext{2,3} is the alternative, depending on how 
> much you trust your hardware, UPS, kernel, etc.
> 
> I'd not use reiserfs for this.
> 
> Regards,
> -- 
> Devrim GUNDUZ
> devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
>                       http://www.gunduz.org
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From devrim at gunduz.org  Mon May 22 08:14:46 2006
From: devrim at gunduz.org (Devrim GUNDUZ)
Date: Mon, 22 May 2006 11:14:46 +0300 (EEST)
Subject: [Linux-cluster] Postgresql under RHCS4
In-Reply-To: <44716765.1000000@gmail.com>
References: <4462F013.40201@gmail.com>
	<Pine.LNX.4.63.0605111111550.22324@mail.kivi.com.tr>
	<446D734C.60308@gmail.com>
	<Pine.LNX.4.63.0605191129300.7386@mail.kivi.com.tr>
	<44716765.1000000@gmail.com>
Message-ID: <Pine.LNX.4.63.0605221041220.6963@mail.kivi.com.tr>


Hi,

On Mon, 22 May 2006, carlopmart wrote:

> No Devrim, I mean which can be the best form to setup replication between 
> master and slave ... and when master goes down, if I put data on slave how 
> can I update master node.

You don't need a replication system there. Use an SAN :)

Here is the schema:

+-------+		+-------+
|Master |		|Slave	|
|	|		|	|
|Node	|		|Node	|
+-------+		+-------+
    |			    |
    --------------------------
 		|
 		|
 	+--------------
 	|	      |
 	| SAN or NAS  |
 	|	      |
 	+-------------+


You will install just operating system,PostgreSQL binaries and Cluster 
tools to both Master and Slave nodes. All $PGDATA will reside in the 
storage.

If master node goes down, Slave node will mount $PGDATA and continue 
working. When master node is up, slave will umount $PGDATA, stop its 
postmaster and will trigger master node and start its postmaster so that 
data will not be corrupted.

Regards,
--
Devrim GUNDUZ
Kivi Bili?im Teknolojileri         -          http://www.kivi.com.tr
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
                       http://www.gunduz.org

From Olivier.Thibault at lmpt.univ-tours.fr  Mon May 22 09:23:18 2006
From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault)
Date: Mon, 22 May 2006 11:23:18 +0200
Subject: [Linux-cluster] Fencing trouble
Message-ID: <44718306.5070605@lmpt.univ-tours.fr>

Hello,

I'm testing RHCS on Fedora Core 5.
I have 2 HP DL380G4 and a MSA500G2 shared between them.
I have a gfs filesystem on the MSA500, with lock_dlm.
This works fine.
I created a IP adress service with system-config-cluster, and it also 
works fine, ie if i shutdown one node, the service start on the other one.
But I have a trouble with fencing. I use iLo fencing. The fence_ilo 
script works if I call it directly with the params, but if I run 
"fence_node node2", I get alternatively the following :

[root at filer1 ~]# fence_node filer2-ha
agent "fence_ilo" reports: error: User login name was not found
power_status: unexpected error

[root at filer1 ~]# fence_node filer2-ha
agent "fence_ilo" reports: error: Syntax error: Line #1: syntax error 
near "?>" in the line: "?xml version="1.0"?>"
power_status: unexpected error

[root at filer1 ~]# fence_node filer2-ha
agent "fence_ilo" reports: error: User login name was not found
power_status: unexpected error

[root at filer1 ~]# fence_node filer2-ha
agent "fence_ilo" reports: error: Syntax error: Line #1: syntax error 
near "?>" in the line: "?xml version="1.0"?>"
power_status: unexpected error

Here is my cluster.conf file :
<?xml version="1.0"?>
<cluster config_version="6" name="filer">
         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
         <clusternodes>
                 <clusternode name="filer1-ha" votes="1">
                         <fence>
                                 <method name="1">
                                         <device name="filer1-ilo"/>
                                 </method>
                         </fence>
                 </clusternode>
                 <clusternode name="filer2-ha" votes="1">
                         <fence>
                                 <method name="1">
                                         <device name="filer2-ilo"/>
                                 </method>
                         </fence>
                 </clusternode>
         </clusternodes>
         <cman expected_votes="1" two_node="1"/>
         <fencedevices>
                 <fencedevice agent="fence_ilo" hostname="filer1-ilo" 
login="Administrator" name="filer1-ilo" passwd="xxxxxx"/>
                 <fencedevice agent="fence_ilo" hostname="filer2-ilo" 
login="Administrator" name="filer2-ilo" passwd="xxxxxx"/>
         </fencedevices>
         <rm>
                 <failoverdomains>
                         <failoverdomain name="test" ordered="1" 
restricted="1">
                                 <failoverdomainnode name="filer1-ha" 
priority="1"/>
                                 <failoverdomainnode name="filer2-ha" 
priority="2"/>
                         </failoverdomain>
                 </failoverdomains>
                 <resources/>
                 <service autostart="1" domain="test" name="test_ip" 
recovery="relocate">
                         <ip address="10.68.5.7" monitor_link="1"/>
                 </service>
         </rm>
</cluster>


What-s wrong ?

Thanks for your help.

Best regards,

Olivier


-- 
Olivier THIBAULT
Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083)
Universit? Fran?ois Rabelais
Parc de Grandmont - 37200 TOURS - FRANCE
T?l: +33 2 47 36 69 12
Fax: +33 2 47 36 69 56



From f.hackenberger at mediatransfer.com  Mon May 22 09:57:28 2006
From: f.hackenberger at mediatransfer.com (Falk Hackenberger - MediaTransfer AG Netresearch & Consulting)
Date: Mon, 22 May 2006 11:57:28 +0200
Subject: [Linux-cluster] CS4 have a splitt-brain-problem in  2-node setup
Message-ID: <44718B08.8080808@mediatransfer.com>

Hello,

I try to use the CS4 for managing a 2-node-cluster.
also i have a working fenicing-device (apc-powerswitch).
I think I have found a design-error in the CS4.

If you boot up one node(1) and this node(1) cant reach the other node(2)
node(1) trys to fence node(2).
This is fine...
but if the fencing fails, node(1)
starts his services via rgmanager without to be shure that the other
node is not working.
volia and you have a splitt-brain problem.

any solution for this?

any answer aka "use more hardware" is not a solution I think.

cu falk



From deval.kulshrestha at progression.com  Mon May 22 10:49:29 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Mon, 22 May 2006 16:19:29 +0530
Subject: [Linux-cluster] CS4 have a splitt-brain-problem in  2-node setup
In-Reply-To: <44718B08.8080808@mediatransfer.com>
Message-ID: <000001c67d8d$6a329b40$6200a8c0@PROGRESSION>

Hi
I have also faced similar error, raised this as bug but at last resolved
this in two ways.

1. Network Bonding(more reliable network)
2. Updated fence package.

This worked for my setup, I am using HP-iLO as fencing device.

Regard 
Deval





-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Falk Hackenberger -
MediaTransfer AG Netresearch & Consulting
Sent: Monday, May 22, 2006 3:27 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] CS4 have a splitt-brain-problem in 2-node setup

Hello,

I try to use the CS4 for managing a 2-node-cluster.
also i have a working fenicing-device (apc-powerswitch).
I think I have found a design-error in the CS4.

If you boot up one node(1) and this node(1) cant reach the other node(2)
node(1) trys to fence node(2).
This is fine...
but if the fencing fails, node(1)
starts his services via rgmanager without to be shure that the other
node is not working.
volia and you have a splitt-brain problem.

any solution for this?

any answer aka "use more hardware" is not a solution I think.

cu falk

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India



From ahmadz at amti.com.ph  Mon May 22 11:00:33 2006
From: ahmadz at amti.com.ph (Ahmadz S. Taneo)
Date: Mon, 22 May 2006 19:00:33 +0800
Subject: [Linux-cluster] INQUIRY ON CLUSTERNFS SERVER MOUNTING PROBLEM ON
	NFS CLIENT USING RCMS 3.0
In-Reply-To: <1974688210.20060517141957@atlas.ua>
Message-ID: <000801c67d8e$f32a9500$19f2a8c0@mstaneo>

Hello Here People,

Good day!

I need help. I have setup a two node NFS server using Redhat Cluster Manager
suite 3.0. I have configured and tested the nfs service for failover and it
seems working fine. Here is my probelm.

1. When I manually mount an NFS shared directory on the NFS client of the
cluster server using the service ip of the CLUSTERNFS and I found no problem
accessing the shared directory.
	a. showmount -e <service IP>
	b. mount service.ip.nfs.share:/SHARE /mnt/share
	c. cd /mnt/share
	e. ls =====>>>> i can see the content of the NFS share directory
2. When a failover occurs, that is, in the event of the NODE1 failure and
NODE2 will take over successfully and do the ls command on the /mnt/share of
the NFS client, there is an error and I couln't display the content of the
display. What I need to do is to umount the /mnt/share mounting and remount
the  shared directory.

3. I was told to use the autofs daemon but for now is not an option. Is
there anything I have missed. I have tried googling around but no luck :-(

Please Help.

Thanks in Advance

ahmadz



From f.hackenberger at mediatransfer.com  Mon May 22 11:19:45 2006
From: f.hackenberger at mediatransfer.com (Falk Hackenberger - MediaTransfer AG Netresearch & Consulting)
Date: Mon, 22 May 2006 13:19:45 +0200
Subject: [Linux-cluster] CS4 have a splitt-brain-problem in  2-node setup
In-Reply-To: <000001c67d8d$6a329b40$6200a8c0@PROGRESSION>
References: <000001c67d8d$6a329b40$6200a8c0@PROGRESSION>
Message-ID: <44719E51.5070804@mediatransfer.com>

Deval kulshrestha schrieb:
> Hi
> I have also faced similar error, raised this as bug but at last resolved
> this in two ways.
> 
> 1. Network Bonding(more reliable network)
> 2. Updated fence package.
wich fenicng-package you are using? and on what os?
we are using Red Hat Enterprise Linux ES 4.3
with fence wich is version 1.32.18.

> This worked for my setup, I am using HP-iLO as fencing device.
> 
> Regard 
> Deval



From Sydney at igretec.be  Mon May 22 11:39:17 2006
From: Sydney at igretec.be (Sydney Bogaert)
Date: Mon, 22 May 2006 13:39:17 +0200
Subject: [Linux-cluster] building GFS-6.1 from SRPMS
Message-ID: <FD2480BC46BBA1419E60EF9995551A2E0C05CE@prime1>

try to get cman and others from here :

ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4ES/en/RHCS/SRPMS/

you will see : cman-kernel-2.6.9-43.8.src.rpm and not 2.6.9-36 

as you dit for this one :
ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4ES/en/RHGFS/SRPMS/

Sydney.

> -----Message d'origine-----
> De : linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com]De la part de Jason
> Envoy? : lundi 22 mai 2006 2:52
> ? : linux clustering
> Objet : Re: [Linux-cluster] building GFS-6.1 from SRPMS
> 
> 
> Thanks for the response Wendy, but, I couldnt find 
> an RPMS dir anywhere under 
> 
> ftp.redhat.com:/pub/redhat/linux/enterprise/4/en/RHCS/i386/SRPMS
> ftp.redhat.com:/pub/redhat/linux/enterprise/4/en/RHGFS/i386/SRPMS
> ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4ES/e
> n/RHGFS/SRPMS/
> 
> but I did find..
> http://www.redhat.com/archives/linux-cluster/2005-October/msg0
0076.html

and that got me farther.. right now, im stuck at 

[root at tf1 rpms]# sudo rpmbuild -bb /usr/src/redhat/SPECS/GFS-kernel.spec
error: Architecture is not included: i386
[root at tf1 rpms]# sudo rpmbuild -bb --target=i686 /usr/src/redhat/SPECS/GFS-kernel.spec
Building target platforms: i686
Building for target i686
error: Failed build dependencies:
        dlm-devel is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernheaders is needed by GFS-kernel-2.6.9-49.1.i686
        kernel-devel = 2.6.9-34.EL is needed by GFS-kernel-2.6.9-49.1.i686
        kernel-smp-devel = 2.6.9-34.EL is needed by GFS-kernel-2.6.9-49.1.i686
        kernel-hugemem-devel = 2.6.9-34.EL is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernel is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernel-smp is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernel-hugemem is needed by GFS-kernel-2.6.9-49.1.i686
        dlm-kernheaders >= 2.6.9 is needed by GFS-kernel-2.6.9-49.1.i686
        cman-kernheaders >= 2.6.9 is needed by GFS-kernel-2.6.9-49.1.i686
[root at tf1 rpms]# 

so i did 
[root at tf1 SPECS]# sudo rpmbuild -bb  --target=i686 cman-kernel.spec 
Building target platforms: i686
Building for target i686
error: Failed build dependencies:
        kernel-smp = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        kernel-hugemem = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        kernel-devel = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        kernel-smp-devel = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        kernel-hugemem-devel = 2.6.9-11.EL is needed by cman-kernel-2.6.9-36.0.i686
        fake-build-provides is needed by cman-kernel-2.6.9-36.0.i686
[root at tf1 SPECS]# 

so I guess I need to edit the spec file and set the extraversion to 
2.6.9-34

and I cant seem to find where to find a redhat version of fake-build-provides..
I have to use the one from CENTOS ? ;)
http://rpm.karan.org/el4/csgfs/SRPMS/fake-build-provides-1.0-20.src.rpm

regards,
Jason

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jparsons at redhat.com  Mon May 22 13:01:41 2006
From: jparsons at redhat.com (James Parsons)
Date: Mon, 22 May 2006 09:01:41 -0400
Subject: [Linux-cluster] Fencing trouble
In-Reply-To: <44718306.5070605@lmpt.univ-tours.fr>
References: <44718306.5070605@lmpt.univ-tours.fr>
Message-ID: <4471B635.2080805@redhat.com>

Olivier Thibault wrote:

> Hello,
>
> I'm testing RHCS on Fedora Core 5.
> I have 2 HP DL380G4 and a MSA500G2 shared between them.
> I have a gfs filesystem on the MSA500, with lock_dlm.
> This works fine.
> I created a IP adress service with system-config-cluster, and it also 
> works fine, ie if i shutdown one node, the service start on the other 
> one.
> But I have a trouble with fencing. I use iLo fencing. The fence_ilo 
> script works if I call it directly with the params, but if I run 
> "fence_node node2", I get alternatively the following :
>
> [root at filer1 ~]# fence_node filer2-ha
> agent "fence_ilo" reports: error: User login name was not found
> power_status: unexpected error
>
> [root at filer1 ~]# fence_node filer2-ha
> agent "fence_ilo" reports: error: Syntax error: Line #1: syntax error 
> near "?>" in the line: "?xml version="1.0"?>"
> power_status: unexpected error
>
> [root at filer1 ~]# fence_node filer2-ha
> agent "fence_ilo" reports: error: User login name was not found
> power_status: unexpected error
>
> [root at filer1 ~]# fence_node filer2-ha
> agent "fence_ilo" reports: error: Syntax error: Line #1: syntax error 
> near "?>" in the line: "?xml version="1.0"?>"
> power_status: unexpected error
>
> Here is my cluster.conf file :
> <?xml version="1.0"?>
> <cluster config_version="6" name="filer">
>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="filer1-ha" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="filer1-ilo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="filer2-ha" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="filer2-ilo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_ilo" hostname="filer1-ilo" 
> login="Administrator" name="filer1-ilo" passwd="xxxxxx"/>
>                 <fencedevice agent="fence_ilo" hostname="filer2-ilo" 
> login="Administrator" name="filer2-ilo" passwd="xxxxxx"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="test" ordered="1" 
> restricted="1">
>                                 <failoverdomainnode name="filer1-ha" 
> priority="1"/>
>                                 <failoverdomainnode name="filer2-ha" 
> priority="2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources/>
>                 <service autostart="1" domain="test" name="test_ip" 
> recovery="relocate">
>                         <ip address="10.68.5.7" monitor_link="1"/>
>                 </service>
>         </rm>
> </cluster>
>
>
> What-s wrong ?
>
> Thanks for your help.
>
> Best regards,
>
> Olivier
>
>
Well, the place to begin is with the version of iLO your machines are 
currently running. Can you check this rev number and post here?

The conf file looks good. The fact that the CLI is working but the 
fence_node command is NOT would suggest that either one of the params in 
the conf file is incorrect (should Administrator be capitalized?) or 
that the param name itself is incorrect. The param names all look ok.

Please let us know the iLO firmware version, Olivier. This is troubling 
to me -- iLO is our most stable supported form of baseboard management 
fencing.

-J



From Olivier.Thibault at lmpt.univ-tours.fr  Mon May 22 14:12:11 2006
From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault)
Date: Mon, 22 May 2006 16:12:11 +0200
Subject: [Linux-cluster] Fencing trouble
In-Reply-To: <4471B635.2080805@redhat.com>
References: <44718306.5070605@lmpt.univ-tours.fr> <4471B635.2080805@redhat.com>
Message-ID: <4471C6BB.2020208@lmpt.univ-tours.fr>

Hi

Thanks for your answer.

James Parsons a ?crit :
> Well, the place to begin is with the version of iLO your machines are 
> currently running. Can you check this rev number and post here?

iLO  1.80 pass21 at 18:05:43 Jul 12 2005

I guess reading you (and HP web site) I should upgrade to 1.84.
I'm gonna do it immediatly and inform you.

> Please let us know the iLO firmware version, Olivier. This is troubling 
> to me -- iLO is our most stable supported form of baseboard management 
> fencing.

I'm very happy to read this as I really need this config to be reliable.

Best regards.

Olivier



From jbrassow at redhat.com  Mon May 22 15:11:02 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Mon, 22 May 2006 10:11:02 -0500
Subject: [Linux-cluster] Clvm on imported Gnbd Problem
In-Reply-To: <BAY22-F4F2F00334C03B23253879A6A50@phx.gbl>
References: <BAY22-F4F2F00334C03B23253879A6A50@phx.gbl>
Message-ID: <62ca1e3c1f5e2a825c5399b0eff5c804@redhat.com>

What do you have for 'locking_type' and 'locking_library' in 
/etc/lvm/lvm.conf?

If these are not set to '2' and 'liblvm2clusterlock.so' respectively, 
your second node will know nothing of what is taking place on your 
first node.

Do you have lvm2-cluster installed?

  brassow

On May 21, 2006, at 12:55 PM, nattapon viroonsri wrote:

> rhel4.0,rhcs4 gfs6.1 , gnbd1.0, piranha-0.8.0-1
> Load balance Structure:
> director: director1
> 2 node(real server): cluster1, cluster2
> gnbd server: gnbd_serv
>
> Gnbd server have no cluster software enable.
> I have export disk as block deives on gnbd server with following step
> #gnbd_serv -n
> # gnbd_export -c -e slice0 -d /dev/sdb
>
> then import device on both real server node(cluster1,cluster2) 
> completely
> #gnbd_impoart -i gnbd_serv
>
> and start all cluster service successfull( ccsd, cman,fence, clvm, 
> gfs).
> I builded  lvm at cluster1 (first real server node)  on imported 
> device from gnbd server
> and have no any error and can format device with gfs.
> But when i list lvm on another node (cluster2)  with pvs and lvs 
> command It have not found lvm i have created on cluster1.
> cluster2 only see the original block device that import from gnbd_serv.
>
> Clvm have start successfully but look like both node cannot sync 
> status of lvm.
> Do i miss some thing before create lvm on imported block device  ?
>
> Nattapon,
> Regard
>
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today it's 
> FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From thorsten.henrici at gfd.de  Mon May 22 17:03:03 2006
From: thorsten.henrici at gfd.de (thorsten.henrici at gfd.de)
Date: Mon, 22 May 2006 19:03:03 +0200
Subject: [Linux-cluster] Kernel Panic (cman) after reboot
Message-ID: <OF5C3021E1.E48DD3BB-ONC1257176.005B1BA4-C1257176.005DAA26@obi.de>


Hello,
I'm experiencing the following reproducable problem:

I have two nodes A and B. I reboot node B and get the following syslog on
node B:

Shutdown Node B - Syslog
----------------------------------------------------------------------
May 20 13:27:24 sdhhdewer38b shutdown: shutting down for system reboot
May 20 13:27:24 sdhhdewer38b init: Switching to runlevel: 6
(...)
May 20 13:27:47 sdhhdewer38b rgmanager: [24641]: <notice> Shutting down
Cluster Service Manager...
May 20 13:27:47 sdhhdewer38b clurgmgrd[31332]: <notice> Shutting down
May 20 13:27:47 sdhhdewer38b clurgmgrd[31332]: <notice> Stopping service
s_ndb_mgmd_ip
May 20 13:27:47 sdhhdewer38b clurgmgrd: [31332]: <info> Removing IPv4
address 10.112.24.20 from eth0
May 20 13:27:57 sdhhdewer38b clurgmgrd[31332]: <notice> Service
s_ndb_mgmd_ip is stopped
May 20 13:27:59 sdhhdewer38b clurgmgrd[31332]: <notice> Shutdown complete,
exiting
May 20 13:28:00 sdhhdewer38b rgmanager: [24641]: <notice> Cluster Service
Manager is stopped.
(...)
May 20 13:28:21 sdhhdewer38b fenced: Stopping fence domain:
May 20 13:28:21 sdhhdewer38b fenced: shutdown succeeded
May 20 13:28:21 sdhhdewer38b fenced:
May 20 13:28:21 sdhhdewer38b fenced:
May 20 13:28:21 sdhhdewer38b rc: Stopping fenced:  succeeded
May 20 13:28:21 sdhhdewer38b lock_gulmd: Stopping lock_gulmd:
May 20 13:28:21 sdhhdewer38b lock_gulmd: shutdown succeeded
May 20 13:28:21 sdhhdewer38b lock_gulmd:                   [
May 20 13:28:21 sdhhdewer38b lock_gulmd:
May 20 13:28:21 sdhhdewer38b rc: Stopping lock_gulmd:  succeeded
May 20 13:28:21 sdhhdewer38b cman: Stopping cman:
May 20 13:28:24 sdhhdewer38b cman: failed to stop cman failed
May 20 13:28:24 sdhhdewer38b cman:                         [
May 20 13:28:24 sdhhdewer38b cman:
May 20 13:28:24 sdhhdewer38b rc: Stopping cman:  failed
May 20 13:28:24 sdhhdewer38b ccsd: Stopping ccsd:
May 20 13:28:24 sdhhdewer38b ccsd[2564]: Stopping ccsd, SIGTERM received.
May 20 13:28:25 sdhhdewer38b ccsd: shutdown succeeded
May 20 13:28:25 sdhhdewer38b ccsd:
May 20 13:28:25 sdhhdewer38b ccsd:
May 20 13:28:25 sdhhdewer38b rc: Stopping ccsd:  succeeded
----------------------------------------------------------

Rebooting Node B crashes (kernel panic) Node B while starting the cman
service (loading the cman module)
That being already prettey bad, it even becomes worse. Node A leaves the
cluster - which brings all services running on it to a halt.

I assume, that this behavior won't occur if I manually remove node B from
the cluster before rebooting. (I haven't tested yet, but will do as soon
as I have the chance to).
Nevertheless I think this behavior is a much too risky thing to have in a
production environment. Is this already known and is there any save way to
fix this?

Syslogs of Node A and B during reboot:

I'm runnig a self-written daemon-process that checks /proc/cluster/status
for the node's membership state. If the cluster hasn't the status 'Member'
for more than 60 seconds, I'm halting the system to get into a consistent
state. Call it something like self-fencing.

Startup Node B - Syslog
--------------------------------------------------------------------------------------
May 20 13:34:19 sdhhdewer38b ccsd[2633]: Connected to cluster
infrastruture via: CMAN/SM Plugin v1.1.5
May 20 13:34:19 sdhhdewer38b ccsd[2633]: Initial status:: Inquorate
May 20 13:34:19 sdhhdewer38b kernel: CMAN: sending membership request
May 20 13:34:19 sdhhdewer38b kernel: CMAN: sending membership request
May 20 13:34:19 sdhhdewer38b kernel: CMAN: got node sdhhdewer38a
May 20 13:36:13 sdhhdewer38b kernel: CMAN: removing node sdhhdewer38a from
the cluster : No response to messages
May 20 13:36:13 sdhhdewer38b kernel: ------------[ cut here ]------------
May 20 13:36:13 sdhhdewer38b kernel: kernel BUG at
/usr/src/build/714635-i686/BUILD/cman-kernel-2.6.9-43/smp/src/membership.c:3150!
May 20 13:36:13 sdhhdewer38b kernel: invalid operand: 0000 [#1]
May 20 13:36:13 sdhhdewer38b kernel: SMP
May 20 13:36:13 sdhhdewer38b kernel: Modules linked in: cman(U) sunrpc md5
ipv6 dm_multipath button battery ac uhci_hcd ehci_hcd hw_random bcm5700(U)
floppy dm_snapshot dm_zero d
m_mirror ext3 jbd dm_mod qla6312(U) qla2400(U) qla2300(U) qla2xxx(U)
qla2xxx_conf(U) cciss sd_mod scsi_mod
May 20 13:36:13 sdhhdewer38b kernel: CPU:    2
May 20 13:36:13 sdhhdewer38b kernel: EIP:    0060:[<f8ae1e2a>]    Not
tainted VLI
May 20 13:36:13 sdhhdewer38b kernel: EFLAGS: 00010246   (2.6.9-34.ELsmp)
May 20 13:36:13 sdhhdewer38b kernel: EIP is at elect_master+0x2e/0x3a
[cman]
May 20 13:36:13 sdhhdewer38b kernel: eax: 00000000   ebx: f77c7fa0   ecx:
00000080   edx: 00000080
May 20 13:36:13 sdhhdewer38b kernel: esi: f8af5044   edi: f77c7fd8   ebp:
00000000   esp: f77c7f98
May 20 13:36:13 sdhhdewer38b kernel: ds: 007b   es: 007b   ss: 0068
May 20 13:36:13 sdhhdewer38b kernel: Process cman_memb (pid: 2658,
threadinfo=f77c7000 task=f7638730)
May 20 13:36:13 sdhhdewer38b kernel: Stack: f8af4f08 f8adf8d1 c364eb00
f6b23320 f8addeb7 f7638730 f7638730 f8ade09a
May 20 13:36:13 sdhhdewer38b kernel:        0000001f 00000000 f705e6b0
00000000 f7638730 c011e71b 00100100 00200200
May 20 13:36:13 sdhhdewer38b kernel:        00000000 00000000 0000007b
f8added8 00000000 00000000 c01041f5 00000000
May 20 13:36:13 sdhhdewer38b kernel: Call Trace:
May 20 13:36:13 sdhhdewer38b kernel:  [<f8adf8d1>]
a_node_just_died+0x13a/0x199 [cman]
May 20 13:36:13 sdhhdewer38b kernel:  [<f8addeb7>]
process_dead_nodes+0x4e/0x6f [cman]
May 20 13:36:13 sdhhdewer38b kernel:  [<f8ade09a>]
membership_kthread+0x1c2/0x39d [cman]
May 20 13:36:13 sdhhdewer38b kernel:  [<c011e71b>]
default_wake_function+0x0/0xc
May 20 13:36:13 sdhhdewer38b kernel:  [<f8added8>]
membership_kthread+0x0/0x39d [cman]
May 20 13:36:13 sdhhdewer38b kernel:  [<c01041f5>]
kernel_thread_helper+0x5/0xb
May 20 13:36:13 sdhhdewer38b kernel: Code: 28 5e af f8 89 c3 ba 01 00 00
00 39 ca 7d 1c a1 2c 5e af f8 8b 04 90 85 c0 74 0d 83 78 1c 02 75 07 89 03
8b 40 14 eb 0d 42 eb e0 <0f> 0
b 4e 0c 73 8d ae f8 31 c0 5b c3 a1 2c 5e af f8 e8 79 10 67
May 20 13:36:13 sdhhdewer38b kernel:  <0>Fatal exception: panic in 5
seconds
May 20 13:36:18 sdhhdewer38b cman: Timed-out waiting for cluster failed
May 20 13:36:18 sdhhdewer38b lock_gulmd: no <gulm> section detected in
/etc/cluster/cluster.conf succeeded
May 20 13:39:26 sdhhdewer38b syslogd 1.4.1: restart.
--------------------------------------------------------------------------------------

Node A - Syslog
--------------------------------------------------------------------------------------
May 20 13:27:57 sdhhdewer38a clurgmgrd[31341]: <info> Magma Event:
Membership Change
May 20 13:27:57 sdhhdewer38a clurgmgrd[31341]: <info> State change:
sdhhdewer38b DOWN
May 20 13:28:00 sdhhdewer38a clurgmgrd[31341]: <notice> Starting stopped
service s_ndb_mgmd_ip
May 20 13:28:00 sdhhdewer38a clurgmgrd: [31341]: <info> Adding IPv4
address 10.112.24.20 to eth0
May 20 13:28:01 sdhhdewer38a clurgmgrd[31341]: <notice> Service
s_ndb_mgmd_ip started
May 20 13:34:19 sdhhdewer38a kernel: CMAN: node sdhhdewer38b rejoining
May 20 13:34:27 sdhhdewer38a logger: Node is not a member of the Cluster.
Membership state: Transition-Master
May 20 13:34:27 sdhhdewer38a logger: Node will be shut down in 50 seconds
May 20 13:34:37 sdhhdewer38a logger: Node is not a member of the Cluster.
Membership state: Transition-Master
May 20 13:34:37 sdhhdewer38a logger: Node will be shut down in 40 seconds
May 20 13:34:40 sdhhdewer38a clurgmgrd: [31341]: <info> Executing
/etc/init.d/httpd status
May 20 13:34:47 sdhhdewer38a logger: Node is not a member of the Cluster.
Membership state: Transition-Master
May 20 13:34:47 sdhhdewer38a logger: Node will be shut down in 30 seconds
May 20 13:34:57 sdhhdewer38a logger: Node is not a member of the Cluster.
Membership state: Transition-Master
May 20 13:34:57 sdhhdewer38a logger: Node will be shut down in 20 seconds
May 20 13:35:07 sdhhdewer38a logger: Node is not a member of the Cluster.
Membership state: Transition-Master
May 20 13:35:07 sdhhdewer38a logger: Node will be shut down in 10 seconds
May 20 13:35:11 sdhhdewer38a clurgmgrd: [31341]: <info> Executing
/etc/init.d/httpd status
May 20 13:35:18 sdhhdewer38a logger: Node is not a member of the Cluster.
Membership state: Transition-Master
May 20 13:35:18 sdhhdewer38a logger: Node will be shut down in 0 seconds
May 20 13:35:18 sdhhdewer38a logger: sdhhdewer38a is currently not a
cluster member. Shutting down to get into a consistent state !
May 20 13:35:18 sdhhdewer38a logger: Killing the following processes
before shutdown: 31341  2742  2745  2743  2744
May 20 13:35:18 sdhhdewer38a shutdown: shutting down for system reboot
--------------------------------------------------------------------------------------

Mit freundlichen Gr??en / Kind Regards

Thorsten Henrici

Abteilung IT-Kommunikation
GfD Gesellschaft f?r Datenverarbeitung mbH
------------------------------------------------------------------------
e-mail thorsten.henrici at gfd.de
fon: +49 21 9676-1857
fax: +49 21 9676-1932

Industriestrasse 10
D-42929 Wermelskirchen

--
IMPORTANT NOTICE:
This email is confidential, may be legally privileged, and is for the
intended recipient only. Access, disclosure, copying, distribution, or
reliance on any of it by anyone else is prohibited and may be a criminal
offence. Please delete if obtained in error and email confirmation to the sender.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060522/89b81b0d/attachment.htm>

From jason at monsterjam.org  Mon May 22 20:28:06 2006
From: jason at monsterjam.org (Jason)
Date: Mon, 22 May 2006 16:28:06 -0400
Subject: [Linux-cluster] building GFS-6.1 from SRPMS
In-Reply-To: <FD2480BC46BBA1419E60EF9995551A2E0C05CE@prime1>
References: <FD2480BC46BBA1419E60EF9995551A2E0C05CE@prime1>
Message-ID: <20060522202806.GC23926@monsterjam.org>

On Mon, May 22, 2006 at 01:39:17PM +0200, Sydney Bogaert wrote:
> try to get cman and others from here :
> 
> ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4ES/en/RHCS/SRPMS/
> 
> you will see : cman-kernel-2.6.9-43.8.src.rpm and not 2.6.9-36 
> 
> as you dit for this one :
> ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4ES/en/RHGFS/SRPMS/
> 
> Sydney.
> 
ok, thanks for the tip.. now Im down to

[root at tf1 rpms]# sudo rpmbuild -bb --target=i686  /usr/src/redhat/SPECS/cman-kernel.spec
Building target platforms: i686
Building for target i686
error: Failed build dependencies:
        kernel-devel = 2.6.9-34.EL is needed by cman-kernel-2.6.9-43.8.i686
        kernel-smp-devel = 2.6.9-34.EL is needed by cman-kernel-2.6.9-43.8.i686
        kernel-hugemem-devel = 2.6.9-34.EL is needed by cman-kernel-2.6.9-43.8.i686
[root at tf1 rpms]# 
[root at tf1 rpms]# 
[root at tf1 rpms]# rpm -qa | grep -i kernel
kernel-smp-2.6.9-34.EL
kernel-devel-2.6.9-34.EL.root
kernel-2.6.9-34.EL
kernel-utils-2.4-13.1.80
kernel-smp-devel-2.6.9-34.EL.root
kernel-doc-2.6.9-34.EL
[root at tf1 rpms]# 

im thinking its the difference between these 
kernel-devel-2.6.9-34.EL.root   vs. kernel-devel-2.6.9-34.EL
ive never seen them with the .root extension.

couldnt seem to dig up much with google.

regards,
Jason



From jason at monsterjam.org  Mon May 22 22:56:37 2006
From: jason at monsterjam.org (Jason)
Date: Mon, 22 May 2006 18:56:37 -0400
Subject: [Linux-cluster] GFS 6.1 built, now RHCS vs linux-ha..
In-Reply-To: <20060522202806.GC23926@monsterjam.org>
References: <FD2480BC46BBA1419E60EF9995551A2E0C05CE@prime1>
	<20060522202806.GC23926@monsterjam.org>
Message-ID: <20060522225637.GB14704@monsterjam.org>

thanks folks.. Ive got GFS 6.1 and friends installed and loaded.. 

What are my options at this point? I ran the system-config-cluster utility and it brings me up a 
nice gui to start creating the cluster.. I was planning on using linux-ha as the actuall 
active-passive failover for the cluster. Would it be better for me to stick with the 
RedHatClusterSuite utils?

just curious what do you think?

regards,
Jason



From Jon.Stanley at savvis.net  Tue May 23 01:37:56 2006
From: Jon.Stanley at savvis.net (Stanley, Jon)
Date: Mon, 22 May 2006 20:37:56 -0500
Subject: [Linux-cluster] building GFS-6.1 from SRPMS
Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B901EEEEE7@s228130hz1ew08.apptix-01.savvis.net>

 

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason
> Sent: Monday, May 22, 2006 3:28 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] building GFS-6.1 from SRPMS
> 
> 

> 
> im thinking its the difference between these 
> kernel-devel-2.6.9-34.EL.root   vs. kernel-devel-2.6.9-34.EL
> ive never seen them with the .root extension.
> 

Yea, you rebuilt the kernel rpm as root *slaps Jason on wrist for
building RPM's as root*, hence the .root extension.  The kernel specfile
contains:

%define rhbsys  %([ -r /etc/beehive-root -o -n "%{?__beehive_build}" ]
&& echo || echo .`whoami`)
%define release %(R="$Revision: 34 $"; RR="${R##: }"; echo
${RR%%?}).EL%{rhbsys}

which basically defines rhbsys as the output of "whoami", and uses that
in the release string.

So you need to install either the redhat supplied kernel (or something
that pretends that it is - you can redefine rhbsys if needed - though
not recommended.  I've had to do that because some third party apps have
depeneded on the version string to determine ABI compatibility, so the
jstanley in there threw it all off - then I redefined it to be something
absurd like 2.6.9-22.0.99 so that it would never be confused with a RH
kernel, and the basis was still obvious, and the script worked), or
modify the cman-kernel specfile to look for the new release of the
kernel (probably the path of least resistance).



From jason at monsterjam.org  Tue May 23 01:46:24 2006
From: jason at monsterjam.org (Jason)
Date: Mon, 22 May 2006 21:46:24 -0400
Subject: [Linux-cluster] building GFS-6.1 from SRPMS
In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B901EEEEE7@s228130hz1ew08.apptix-01.savvis.net>
References: <9A6FE0FCC2B29846824C5CD81C6647B901EEEEE7@s228130hz1ew08.apptix-01.savvis.net>
Message-ID: <20060523014624.GC82113@monsterjam.org>

> 
> Yea, you rebuilt the kernel rpm as root *slaps Jason on wrist for
> building RPM's as root*, hence the .root extension.  The kernel specfile
> contains:
> 
aha! I knew you arent supposed to compile source code as root, did not know 
that the same applies for rebuilding rpms.. But I guess it is the same thing. ;)
now I just need to know if its possible to use GFS/lvm2 without all the rest of the RHCS 
stuff.

regards,
Jason



From nattaponv at hotmail.com  Tue May 23 01:17:59 2006
From: nattaponv at hotmail.com (nattapon viroonsri)
Date: Tue, 23 May 2006 01:17:59 +0000
Subject: [Linux-cluster] Clvm on imported Gnbd Problem
In-Reply-To: <62ca1e3c1f5e2a825c5399b0eff5c804@redhat.com>
Message-ID: <BAY22-F20B354B348046FA578947A69B0@phx.gbl>

Yes, clvmd start successfully
Actually, first i test on san with no problem then switch to gnbd

Nattapon,
Regard

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



From pcaulfie at redhat.com  Tue May 23 07:18:22 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 23 May 2006 08:18:22 +0100
Subject: [Linux-cluster] Kernel Panic (cman) after reboot
In-Reply-To: <OF5C3021E1.E48DD3BB-ONC1257176.005B1BA4-C1257176.005DAA26@obi.de>
References: <OF5C3021E1.E48DD3BB-ONC1257176.005B1BA4-C1257176.005DAA26@obi.de>
Message-ID: <4472B73E.4010301@redhat.com>

thorsten.henrici at gfd.de wrote:
> 
> Hello,
> I'm experiencing the following reproducable problem:
> 
> I have two nodes A and B. I reboot node B and get the following syslog
> on node B:
> 
> Shutdown Node B - Syslog
> ----------------------------------------------------------------------
> May 20 13:27:24 sdhhdewer38b shutdown: shutting down for system reboot
> May 20 13:27:24 sdhhdewer38b init: Switching to runlevel: 6
> (...)
> May 20 13:27:47 sdhhdewer38b rgmanager: [24641]: <notice> Shutting down
> Cluster Service Manager...
> May 20 13:27:47 sdhhdewer38b clurgmgrd[31332]: <notice> Shutting down
> May 20 13:27:47 sdhhdewer38b clurgmgrd[31332]: <notice> Stopping service
> s_ndb_mgmd_ip
> May 20 13:27:47 sdhhdewer38b clurgmgrd: [31332]: <info> Removing IPv4
> address 10.112.24.20 from eth0
> May 20 13:27:57 sdhhdewer38b clurgmgrd[31332]: <notice> Service
> s_ndb_mgmd_ip is stopped
> May 20 13:27:59 sdhhdewer38b clurgmgrd[31332]: <notice> Shutdown
> complete, exiting
> May 20 13:28:00 sdhhdewer38b rgmanager: [24641]: <notice> Cluster
> Service Manager is stopped.
> (...)
> May 20 13:28:21 sdhhdewer38b fenced: Stopping fence domain:
> May 20 13:28:21 sdhhdewer38b fenced: shutdown succeeded
> May 20 13:28:21 sdhhdewer38b fenced:
> May 20 13:28:21 sdhhdewer38b fenced:
> May 20 13:28:21 sdhhdewer38b rc: Stopping fenced:  succeeded
> May 20 13:28:21 sdhhdewer38b lock_gulmd: Stopping lock_gulmd:
> May 20 13:28:21 sdhhdewer38b lock_gulmd: shutdown succeeded
> May 20 13:28:21 sdhhdewer38b lock_gulmd:                   [  
> May 20 13:28:21 sdhhdewer38b lock_gulmd:
> May 20 13:28:21 sdhhdewer38b rc: Stopping lock_gulmd:  succeeded
> May 20 13:28:21 sdhhdewer38b cman: Stopping cman:
> May 20 13:28:24 sdhhdewer38b cman: failed to stop cman failed
> May 20 13:28:24 sdhhdewer38b cman:                         [
> May 20 13:28:24 sdhhdewer38b cman:
> May 20 13:28:24 sdhhdewer38b rc: Stopping cman:  failed
> May 20 13:28:24 sdhhdewer38b ccsd: Stopping ccsd:
> May 20 13:28:24 sdhhdewer38b ccsd[2564]: Stopping ccsd, SIGTERM received.
> May 20 13:28:25 sdhhdewer38b ccsd: shutdown succeeded
> May 20 13:28:25 sdhhdewer38b ccsd:
> May 20 13:28:25 sdhhdewer38b ccsd:
> May 20 13:28:25 sdhhdewer38b rc: Stopping ccsd:  succeeded
> ----------------------------------------------------------
> 
> Rebooting Node B crashes (kernel panic) Node B while starting the cman
> service (loading the cman module)
> That being already prettey bad, it even becomes worse. Node A leaves the
> cluster - which brings all services running on it to a halt.
> 
> I assume, that this behavior won't occur if I manually remove node B
> from the cluster before rebooting. (I haven't tested yet, but will do as
> soon as I have the chance to).
> Nevertheless I think this behavior is a much too risky thing to have in
> a production environment. Is this already known and is there any save
> way to fix this?
> 
> Syslogs of Node A and B during reboot:
> 
> I'm runnig a self-written daemon-process that checks
> /proc/cluster/status for the node's membership state. If the cluster
> hasn't the status 'Member' for more than 60 seconds, I'm halting the
> system to get into a consistent state. Call it something like self-fencing.
> 
> Startup Node B - Syslog
> --------------------------------------------------------------------------------------
> 
> May 20 13:34:19 sdhhdewer38b ccsd[2633]: Connected to cluster
> infrastruture via: CMAN/SM Plugin v1.1.5
> May 20 13:34:19 sdhhdewer38b ccsd[2633]: Initial status:: Inquorate
> May 20 13:34:19 sdhhdewer38b kernel: CMAN: sending membership request
> May 20 13:34:19 sdhhdewer38b kernel: CMAN: sending membership request
> May 20 13:34:19 sdhhdewer38b kernel: CMAN: got node sdhhdewer38a
> May 20 13:36:13 sdhhdewer38b kernel: CMAN: removing node sdhhdewer38a
> from the cluster : No response to messages
> May 20 13:36:13 sdhhdewer38b kernel: ------------[ cut here ]------------
> May 20 13:36:13 sdhhdewer38b kernel: kernel BUG at
> /usr/src/build/714635-i686/BUILD/cman-kernel-2.6.9-43/smp/src/membership.c:3150!
> 
> May 20 13:36:13 sdhhdewer38b kernel: invalid operand: 0000 [#1]
> May 20 13:36:13 sdhhdewer38b kernel: SMP
> May 20 13:36:13 sdhhdewer38b kernel: Modules linked in: cman(U) sunrpc
> md5 ipv6 dm_multipath button battery ac uhci_hcd ehci_hcd hw_random
> bcm5700(U) floppy dm_snapshot dm_zero d
> m_mirror ext3 jbd dm_mod qla6312(U) qla2400(U) qla2300(U) qla2xxx(U)
> qla2xxx_conf(U) cciss sd_mod scsi_mod
> May 20 13:36:13 sdhhdewer38b kernel: CPU:    2
> May 20 13:36:13 sdhhdewer38b kernel: EIP:    0060:[<f8ae1e2a>]    Not
> tainted VLI
> May 20 13:36:13 sdhhdewer38b kernel: EFLAGS: 00010246   (2.6.9-34.ELsmp)
> May 20 13:36:13 sdhhdewer38b kernel: EIP is at elect_master+0x2e/0x3a
> [cman]
> May 20 13:36:13 sdhhdewer38b kernel: eax: 00000000   ebx: f77c7fa0  
> ecx: 00000080   edx: 00000080
> May 20 13:36:13 sdhhdewer38b kernel: esi: f8af5044   edi: f77c7fd8  
> ebp: 00000000   esp: f77c7f98
> May 20 13:36:13 sdhhdewer38b kernel: ds: 007b   es: 007b   ss: 0068
> May 20 13:36:13 sdhhdewer38b kernel: Process cman_memb (pid: 2658,
> threadinfo=f77c7000 task=f7638730)
> May 20 13:36:13 sdhhdewer38b kernel: Stack: f8af4f08 f8adf8d1 c364eb00
> f6b23320 f8addeb7 f7638730 f7638730 f8ade09a
> May 20 13:36:13 sdhhdewer38b kernel:        0000001f 00000000 f705e6b0
> 00000000 f7638730 c011e71b 00100100 00200200
> May 20 13:36:13 sdhhdewer38b kernel:        00000000 00000000 0000007b
> f8added8 00000000 00000000 c01041f5 00000000
> May 20 13:36:13 sdhhdewer38b kernel: Call Trace:
> May 20 13:36:13 sdhhdewer38b kernel:  [<f8adf8d1>]
> a_node_just_died+0x13a/0x199 [cman]
> May 20 13:36:13 sdhhdewer38b kernel:  [<f8addeb7>]
> process_dead_nodes+0x4e/0x6f [cman]
> May 20 13:36:13 sdhhdewer38b kernel:  [<f8ade09a>]
> membership_kthread+0x1c2/0x39d [cman]
> May 20 13:36:13 sdhhdewer38b kernel:  [<c011e71b>]
> default_wake_function+0x0/0xc
> May 20 13:36:13 sdhhdewer38b kernel:  [<f8added8>]
> membership_kthread+0x0/0x39d [cman]
> May 20 13:36:13 sdhhdewer38b kernel:  [<c01041f5>]
> kernel_thread_helper+0x5/0xb
> May 20 13:36:13 sdhhdewer38b kernel: Code: 28 5e af f8 89 c3 ba 01 00 00
> 00 39 ca 7d 1c a1 2c 5e af f8 8b 04 90 85 c0 74 0d 83 78 1c 02 75 07 89
> 03 8b 40 14 eb 0d 42 eb e0 <0f> 0
> b 4e 0c 73 8d ae f8 31 c0 5b c3 a1 2c 5e af f8 e8 79 10 67
> May 20 13:36:13 sdhhdewer38b kernel:  <0>Fatal exception: panic in 5
> seconds
> May 20 13:36:18 sdhhdewer38b cman: Timed-out waiting for cluster failed
> May 20 13:36:18 sdhhdewer38b lock_gulmd: no <gulm> section detected in
> /etc/cluster/cluster.conf succeeded
> May 20 13:39:26 sdhhdewer38b syslogd 1.4.1: restart.
> --------------------------------------------------------------------------------------
> 
> 
> Node A - Syslog
> --------------------------------------------------------------------------------------
> 
> May 20 13:27:57 sdhhdewer38a clurgmgrd[31341]: <info> Magma Event:
> Membership Change
> May 20 13:27:57 sdhhdewer38a clurgmgrd[31341]: <info> State change:
> sdhhdewer38b DOWN
> May 20 13:28:00 sdhhdewer38a clurgmgrd[31341]: <notice> Starting stopped
> service s_ndb_mgmd_ip
> May 20 13:28:00 sdhhdewer38a clurgmgrd: [31341]: <info> Adding IPv4
> address 10.112.24.20 to eth0
> May 20 13:28:01 sdhhdewer38a clurgmgrd[31341]: <notice> Service
> s_ndb_mgmd_ip started
> May 20 13:34:19 sdhhdewer38a kernel: CMAN: node sdhhdewer38b rejoining
> May 20 13:34:27 sdhhdewer38a logger: Node is not a member of the
> Cluster. Membership state: Transition-Master
> May 20 13:34:27 sdhhdewer38a logger: Node will be shut down in 50 seconds
> May 20 13:34:37 sdhhdewer38a logger: Node is not a member of the
> Cluster. Membership state: Transition-Master
> May 20 13:34:37 sdhhdewer38a logger: Node will be shut down in 40 seconds
> May 20 13:34:40 sdhhdewer38a clurgmgrd: [31341]: <info> Executing
> /etc/init.d/httpd status
> May 20 13:34:47 sdhhdewer38a logger: Node is not a member of the
> Cluster. Membership state: Transition-Master
> May 20 13:34:47 sdhhdewer38a logger: Node will be shut down in 30 seconds
> May 20 13:34:57 sdhhdewer38a logger: Node is not a member of the
> Cluster. Membership state: Transition-Master
> May 20 13:34:57 sdhhdewer38a logger: Node will be shut down in 20 seconds
> May 20 13:35:07 sdhhdewer38a logger: Node is not a member of the
> Cluster. Membership state: Transition-Master
> May 20 13:35:07 sdhhdewer38a logger: Node will be shut down in 10 seconds
> May 20 13:35:11 sdhhdewer38a clurgmgrd: [31341]: <info> Executing
> /etc/init.d/httpd status
> May 20 13:35:18 sdhhdewer38a logger: Node is not a member of the
> Cluster. Membership state: Transition-Master
> May 20 13:35:18 sdhhdewer38a logger: Node will be shut down in 0 seconds
> May 20 13:35:18 sdhhdewer38a logger: sdhhdewer38a is currently not a
> cluster member. Shutting down to get into a consistent state !
> May 20 13:35:18 sdhhdewer38a logger: Killing the following processes
> before shutdown: 31341  2742  2745  2743  2744
> May 20 13:35:18 sdhhdewer38a shutdown: shutting down for system reboot
> --------------------------------------------------------------------------------------
>

This is in bugzilla:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777

a fix is in CVS.

-- 

patrick



From rpeterso at redhat.com  Tue May 23 14:10:01 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Tue, 23 May 2006 09:10:01 -0500
Subject: [Linux-cluster] GFS 6.1 built, now RHCS vs linux-ha..
In-Reply-To: <20060522225637.GB14704@monsterjam.org>
References: <FD2480BC46BBA1419E60EF9995551A2E0C05CE@prime1>
	<20060522202806.GC23926@monsterjam.org>
	<20060522225637.GB14704@monsterjam.org>
Message-ID: <1148393401.18630.3.camel@technetium.msp.redhat.com>

On Mon, 2006-05-22 at 18:56 -0400, Jason wrote:
> thanks folks.. Ive got GFS 6.1 and friends installed and loaded.. 
> 
> What are my options at this point? I ran the system-config-cluster utility and it brings me up a 
> nice gui to start creating the cluster.. I was planning on using linux-ha as the actuall 
> active-passive failover for the cluster. Would it be better for me to stick with the 
> RedHatClusterSuite utils?
> 
> just curious what do you think?
> 
> regards,
> Jason

I'm in favor of using Red Hat Cluster Suite.  Of course, I work there, so I'm biased.
The rgmanager tool is designed to provide seamless failover for many types of services,
even custom-designed ones.

Regards,

Bob Peterson
Red Hat Cluster Suite




From sgrieve at star-telegram.com  Tue May 23 14:19:32 2006
From: sgrieve at star-telegram.com (Grieve, Shane)
Date: Tue, 23 May 2006 09:19:32 -0500
Subject: [Linux-cluster] (no subject)
Message-ID: <C319D2BCA4F21E45B3403D4C1707014B020347FC@FWSTEX03.fwst.fwstroot.star-telegram.com>

Hello,
	I am having problems accessing certain files shared to MAC
clients though Netatalk 2.0.3 and PC though though samba. When you try
and access the files usually text files though the console using vi etc
the session hangs. RedHat Enterprise 4 kernel 2.6.9-5 2 x Dell Poweredge
2850 HDS 9570 and fence_mcdata.

Thanks
Shane    

May 23 08:51:10 maccluster1 kernel: ------------[ cut here ]------------
May 23 08:51:10 maccluster1 kernel: kernel BUG at
/root/cluster/gfs-kernel/src/dlm/plock.c:797!
May 23 08:51:10 maccluster1 kernel: invalid operand: 0000 [#7]
May 23 08:51:10 maccluster1 kernel: SMP 
May 23 08:51:10 maccluster1 kernel: Modules linked in: nfs appletalk(U)
nfsd exportfs lockd parport_pc lp parport autofs4 i2c_dev i2c_core
lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) sunrpc md5 ipv6 button
battery ac uhci_hcd ehci_hcd e1000(U) floppy qla2300 qla2xxx
scsi_transport_fc sg dm_snapshot dm_zero dm_mirror ext3 jbd(U) dm_mod
megaraid_mbox(U) megaraid_mm(U) sd_mod scsi_mod
May 23 08:51:10 maccluster1 kernel: CPU:    1
May 23 08:51:10 maccluster1 kernel: EIP:    0060:[<f89b0480>]
Tainted: GF     VLI
May 23 08:51:10 maccluster1 kernel: EFLAGS: 00010246   (2.6.9-5.ELsmp) 
May 23 08:51:10 maccluster1 kernel: EIP is at plock_internal+0x1a2/0x29c
[lock_dlm]
May 23 08:51:10 maccluster1 kernel: eax: 00000001   ebx: f5d6cfc0   ecx:
ccff0e50   edx: f89b29d3
May 23 08:51:10 maccluster1 kernel: esi: 00008000   edi: 00000000   ebp:
de938c80   esp: ccff0e4c
May 23 08:51:10 maccluster1 kernel: ds: 007b   es: 007b   ss: 0068
May 23 08:51:10 maccluster1 kernel: Process afpd (pid: 18192,
threadinfo=ccff0000 task=e10cf330)
May 23 08:51:10 maccluster1 kernel: Stack: f89b29d3 f89b296d 0000031d
f89b2944 f89b2a6f 0a044081 00000000 f5dad740 
May 23 08:51:10 maccluster1 kernel:        de938cb0 00000000 e1009280
f5dad740 f5dad740 00000000 de938c80 00008c3c 
May 23 08:51:10 maccluster1 kernel:        00000000 f89b09c8 00000000
00000c3d 00000000 00008c3c 00000000 00000c3d 
May 23 08:51:10 maccluster1 kernel: Call Trace:
May 23 08:51:10 maccluster1 kernel:  [<f89b09c8>]
lm_dlm_plock+0x1a6/0x24a [lock_dlm]
May 23 08:51:10 maccluster1 kernel:  [<f8f27774>] gfs_lm_plock+0x34/0x44
[gfs]
May 23 08:51:10 maccluster1 kernel:  [<f8f32b9c>] gfs_lock+0xe4/0xf2
[gfs]
May 23 08:51:10 maccluster1 kernel:  [<f8f32ab8>] gfs_lock+0x0/0xf2
[gfs]
May 23 08:51:10 maccluster1 kernel:  [<c01686c5>]
fcntl_setlk64+0x156/0x291
May 23 08:51:10 maccluster1 kernel:  [<c0127d49>]
__mod_timer+0x101/0x10b
May 23 08:51:10 maccluster1 kernel:  [<c0124257>]
do_setitimer+0x16c/0x1c5
May 23 08:51:10 maccluster1 kernel:  [<c01562c4>] fget+0x3b/0x42
May 23 08:51:10 maccluster1 kernel:  [<c016481f>] sys_fcntl64+0x5a/0x7d
May 23 08:51:10 maccluster1 kernel:  [<c02c62a3>] syscall_call+0x7/0xb
May 23 08:51:10 maccluster1 kernel:  [<c02c007b>]
unix_detach_fds+0x2e/0x31
May 23 08:51:10 maccluster1 kernel: Code: ff ff a1 a0 48 31 c0 50 68 6f
2a 9b f8 68 44 29 9b f8 68 1d 03 00 00 68 6d 29 9b f8 e8 90 07 77 c7 68
d3 29 9b f8 e8 86 07 77 c7 <0f> 0b 1d 03 44 29 9b f8 68 f1 29 9b f8 e8
2d ff 76 c7 89 5c 24



From lhh at redhat.com  Tue May 23 16:21:53 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 23 May 2006 12:21:53 -0400
Subject: [Linux-cluster] CS4 have a splitt-brain-problem in  2-node setup
In-Reply-To: <44718B08.8080808@mediatransfer.com>
References: <44718B08.8080808@mediatransfer.com>
Message-ID: <1148401313.20766.16.camel@ayanami.boston.redhat.com>

On Mon, 2006-05-22 at 11:57 +0200, Falk Hackenberger - MediaTransfer AG
Netresearch & Consulting wrote:
> Hello,
> 
> I try to use the CS4 for managing a 2-node-cluster.
> also i have a working fenicing-device (apc-powerswitch).
> I think I have found a design-error in the CS4.
> 
> If you boot up one node(1) and this node(1) cant reach the other node(2)
> node(1) trys to fence node(2).
> This is fine...
> but if the fencing fails, node(1)
> starts his services via rgmanager without to be shure that the other
> node is not working.
> volia and you have a splitt-brain problem.

> any solution for this?

rgmanager should not start services unless the node is quorate.  The
cluster should not ever become quorate if fencing fails.  Sounds like a
bug.

-- Lon



From lhh at redhat.com  Tue May 23 16:26:08 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 23 May 2006 12:26:08 -0400
Subject: [Linux-cluster] CS4 have a splitt-brain-problem in  2-node setup
In-Reply-To: <1148401313.20766.16.camel@ayanami.boston.redhat.com>
References: <44718B08.8080808@mediatransfer.com>
	<1148401313.20766.16.camel@ayanami.boston.redhat.com>
Message-ID: <1148401568.20766.18.camel@ayanami.boston.redhat.com>

On Tue, 2006-05-23 at 12:21 -0400, Lon Hohberger wrote:
> On Mon, 2006-05-22 at 11:57 +0200, Falk Hackenberger - MediaTransfer AG
> Netresearch & Consulting wrote:
> > Hello,
> > 
> > I try to use the CS4 for managing a 2-node-cluster.
> > also i have a working fenicing-device (apc-powerswitch).
> > I think I have found a design-error in the CS4.
> > 
> > If you boot up one node(1) and this node(1) cant reach the other node(2)
> > node(1) trys to fence node(2).
> > This is fine...
> > but if the fencing fails, node(1)
> > starts his services via rgmanager without to be shure that the other
> > node is not working.
> > volia and you have a splitt-brain problem.
> 
> > any solution for this?
> 
> rgmanager should not start services unless the node is quorate.  The
> cluster should not ever become quorate if fencing fails.  Sounds like a
> bug.

Wait, fencing fails or fenced fails to start?

--



From redhat at meharrington.com  Tue May 23 16:43:40 2006
From: redhat at meharrington.com (Mark Harrington)
Date: Tue, 23 May 2006 12:43:40 -0400
Subject: [Linux-cluster] NFS Server Cluster
Message-ID: <001201c67e88$0cb6b3b0$a60b0aac@exeter.com>

Hi All,

I am playing around with a 2 node NFS Server cluster with RHCS/GFS 4 Update
3 and had a few questions. 

I could not find much documentation on active-active NFS clusters with RHCS
4 so I set up a pseudo active-active cluster as follows: 
 - Created 2 RHCS services that export a single GFS volume with NFS.  
 - Each service has a prioritized fail-over domain so if:
    - Both nodes are up each node will export the GFS volume
    - A node goes down the service (& VIP) will fail over to the other node.

 - 1/2 the NFS clients (Web server farm) will mount off one VIP and the 
   other 1/2 off the other.
 - The NFS clients are only permitted to mount 1 service

1) We are still testing this configuration but has anyone else setup an NFS
cluster like this?  Is there a better way?  Can you foresee any issues with
this setup?

3) Do we need to put /var/lib/nfs on a GFS volume so both NFS servers can
update the locks file?  Will there be any issues if both NFS servers try to
update the locks file at the same time?

3) Is it possible to do this with NFSv4?

Thanks for any help,
Mark





From Britt.Treece at savvis.net  Tue May 23 19:50:04 2006
From: Britt.Treece at savvis.net (Treece, Britt)
Date: Tue, 23 May 2006 14:50:04 -0500
Subject: [Linux-cluster] GFS 6.0 adding journals
Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B901DD92D9@s228130hz1ew08.apptix-01.savvis.net>

I'm adding 3 nodes to my 7node cluster and don't have space on the GFS
mount to add the additional journals.  I'm planning on adding a 1GB
subpool for the additional journals but am not clear what the correct
procedure for doing this is.  

 

My current pool configuration is...

 

poolname pool_gfs01

#minor <dynamically assigned>

subpools 1

subpool 0 0 1 gfs_data

pooldevice 0 0 /dev/sda2

 

I'm planning on changing it to...

 

poolname pool_gfs01

subpools 2

subpool 0 0 1 gfs_data

subpool 1 0 1 gfs_journal

pooldevice 0 0 /dev/sda2

pooldevice 1 0 /dev/sdb1

 

Assuming the above config change is correct what action do I perform
next to fill this new subpool of gfs01 with journals for the 3 nodes I'm
adding leaving a couple extra journals future nodes.

 

Thanks

 

Britt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060523/aed819db/attachment.htm>

From Harry.Sutton at hp.com  Tue May 23 20:53:01 2006
From: Harry.Sutton at hp.com (Sutton, Harry (MSE))
Date: Tue, 23 May 2006 16:53:01 -0400
Subject: [Linux-cluster] Configuring NFS on Red Hat Enterprise Linux 4 AS /
 Cluster Suite / GFS 6.1
Message-ID: <4473762D.5000101@hp.com>

This seems like a fairly straightforward thing to do, but I'm having a
heck of a time accomplishing it.

I have a three-node cluster (RHEL 4 AS U3) with a clustered web server
working fine (chkconfig httpd off on all nodes, init script as a
resource attached to the service, floating service ip address, shared
ext3 filesystem). But I can't get nfs working.

All the documentation I can find explains how to set this up for the
RHEL 3 clustering, but it doesn't translate directly to RHEL 4. Here's
what I want to do:

1. Export a shared ext3 filesystem via nfs.
2. Export a GFS filesystem (currently accessible to all cluster members)
via nfs for use by non-cluster members.

I created resources for the floating IP address, and a separate NFS
Client resource for each system (or group of systems) that I want to be
able to mount them. I created an NFS export resource pointing to the
filesystem to be exported. But I can't figure out what
system-config-cluster is looking for in the fields for the NFS Mount
resource. What's supposed to go into the "Host" field in that resource
box? "Mount" I'm guessing is the device, e.g., /dev/sda1, and "Export"
is the mount point, e.g., /nfs-share. But that's just it, I'm guessing,
because I can't find it documented anywhere.

The RHEL 3 cluster documentation says portmapper and nfs have to be
running on all cluster nodes, and to not use the init script to start it
up (contrary to the web service, which is confusing). It also says to
NOT have the filesystems in question in either /etc/fstab or
/etc/exports. So how is this supposed to work? And if it's documented,
where?!

Thanks,

    /Harry Sutton
     Hewlett-Packard Company
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 6145 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060523/eaf26245/attachment.bin>

From Olivier.Thibault at lmpt.univ-tours.fr  Tue May 23 21:45:17 2006
From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault)
Date: Tue, 23 May 2006 23:45:17 +0200
Subject: [Linux-cluster] Fencing trouble - solved
In-Reply-To: <4471C6BB.2020208@lmpt.univ-tours.fr>
References: <44718306.5070605@lmpt.univ-tours.fr> <4471B635.2080805@redhat.com>
	<4471C6BB.2020208@lmpt.univ-tours.fr>
Message-ID: <4473826D.6000606@lmpt.univ-tours.fr>

Hi,

Olivier Thibault wrote :
> 
> James Parsons wrote :
>> Well, the place to begin is with the version of iLO your machines are 
>> currently running. Can you check this rev number and post here?
> 
> iLO  1.80 pass21 at 18:05:43 Jul 12 2005
> 
> I guess reading you (and HP web site) I should upgrade to 1.84.
> I'm gonna do it immediatly and inform you.
> 
After upgrading iLo to 1.84, fencing works fine.

Thanks for your help.

Best regards,

Olivier



From rpeterso at redhat.com  Tue May 23 22:27:13 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Tue, 23 May 2006 17:27:13 -0500
Subject: [Linux-cluster] Configuring NFS on Red Hat Enterprise Linux 4
	AS / Cluster Suite / GFS 6.1
In-Reply-To: <4473762D.5000101@hp.com>
References: <4473762D.5000101@hp.com>
Message-ID: <1148423233.6545.43.camel@technetium.msp.redhat.com>

On Tue, 2006-05-23 at 16:53 -0400, Sutton, Harry (MSE) wrote:
> This seems like a fairly straightforward thing to do, but I'm having a
> heck of a time accomplishing it.

Hi Harry,

Since I've got some experience in this matter, earlier in the year, I
tried to write an "NFS/GFS Cookbook" that was a nuts-and-bolts
("example-driven") approach to configuring NFS to work with Cluster
Suite and GFS exactly as you're describing.

So far, I haven't had much time to refine it and make it into an
officially sanctioned "Red Hat Document."  It's just a draft.  However, 
because I'm seeing more questions about this lately, I've decided to 
make it available on our cvs source directory here:

http://sources.redhat.com/cluster/doc/nfscookbook.pdf 

I hope this helps.
Just keep in mind: This is a draft.  There may be mistakes.
If you let me know what they are, I can fix the doc though...
Hope this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite




From rpeterso at redhat.com  Tue May 23 22:35:40 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Tue, 23 May 2006 17:35:40 -0500
Subject: [Linux-cluster] NFS Server Cluster
In-Reply-To: <001201c67e88$0cb6b3b0$a60b0aac@exeter.com>
References: <001201c67e88$0cb6b3b0$a60b0aac@exeter.com>
Message-ID: <1148423740.6545.51.camel@technetium.msp.redhat.com>

On Tue, 2006-05-23 at 12:43 -0400, Mark Harrington wrote:
> Hi All,
> 
> I am playing around with a 2 node NFS Server cluster with RHCS/GFS 4 Update
> 3 and had a few questions. 
> 
> I could not find much documentation on active-active NFS clusters with RHCS
> 4 so I set up a pseudo active-active cluster as follows: 
>  - Created 2 RHCS services that export a single GFS volume with NFS.  
>  - Each service has a prioritized fail-over domain so if:
>     - Both nodes are up each node will export the GFS volume
>     - A node goes down the service (& VIP) will fail over to the other node.
> 
>  - 1/2 the NFS clients (Web server farm) will mount off one VIP and the 
>    other 1/2 off the other.
>  - The NFS clients are only permitted to mount 1 service
> 
> 1) We are still testing this configuration but has anyone else setup an NFS
> cluster like this?  Is there a better way?  Can you foresee any issues with
> this setup?

I don't see any issues.  As a matter of fact, I recommend exactly this
in the "NFS/GFS cookbook" document I posted a few minutes ago to
linux-cluster:

http://sources.redhat.com/cluster/doc/nfscookbook.pdf 

> 3) Do we need to put /var/lib/nfs on a GFS volume so both NFS servers can
> update the locks file?  Will there be any issues if both NFS servers try to
> update the locks file at the same time?

I don't think so, but I could be wrong.

> 3) Is it possible to do this with NFSv4?

I'm pretty sure you can.

> Thanks for any help,
> Mark

Bob Peterson
Red Hat Cluster Suite




From wcheng at redhat.com  Wed May 24 02:11:36 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Tue, 23 May 2006 22:11:36 -0400
Subject: [Linux-cluster] NFS Server Cluster
In-Reply-To: <1148423740.6545.51.camel@technetium.msp.redhat.com>
References: <001201c67e88$0cb6b3b0$a60b0aac@exeter.com>
	<1148423740.6545.51.camel@technetium.msp.redhat.com>
Message-ID: <1148436697.4677.69.camel@localhost.localdomain>

On Tue, 2006-05-23 at 17:35 -0500, Robert S Peterson wrote:
> On Tue, 2006-05-23 at 12:43 -0400, Mark Harrington wrote:
> > 3) Do we need to put /var/lib/nfs on a GFS volume so both NFS servers can
> > update the locks file?  Will there be any issues if both NFS servers try to
> > update the locks file at the same time?
> 
> I don't think so, but I could be wrong.

The current RHCS out in the fields have few locking issues that need to
get ironed out. The first visible problem is the existing Linux NLM
doesn't call filesystem specific lock method - implies GFS is not able
to support file locking via NFS across different servers. We're working
with NLM/NFS developers to solve these issues at this moment.

Don't confuse the issue with file locking across GFS nodes. File locking
(both flock and posix lock) across different GFS servers has been
supported.

Another specific is that we don't support NFS lock failover at this
moment. This is also due to some other Linux NLM issues and we're
working on it.


> > 3) Is it possible to do this with NFSv4?
> 

NFSv4 has state information that may not work well in a cluster
environment. It is again currently a work-in-progress effort. The CITI
group from University of Michigan is leading this development effort at
this moment. 


-- Wendy



From f.hackenberger at mediatransfer.de  Wed May 24 09:21:45 2006
From: f.hackenberger at mediatransfer.de (Falk Hackenberger - MediaTransfer AG Netresearch & Consulting)
Date: Wed, 24 May 2006 11:21:45 +0200
Subject: [Linux-cluster] CS4 have a splitt-brain-problem in  2-node setup
In-Reply-To: <1148401568.20766.18.camel@ayanami.boston.redhat.com>
References: <44718B08.8080808@mediatransfer.com>	<1148401313.20766.16.camel@ayanami.boston.redhat.com>
	<1148401568.20766.18.camel@ayanami.boston.redhat.com>
Message-ID: <447425A9.7040507@mediatransfer.de>

Lon Hohberger schrieb:
> On Tue, 2006-05-23 at 12:21 -0400, Lon Hohberger wrote:
> 
>>On Mon, 2006-05-22 at 11:57 +0200, Falk Hackenberger - MediaTransfer AG
>>Netresearch & Consulting wrote:
>>
>>>Hello,
>>>
>>>I try to use the CS4 for managing a 2-node-cluster.
>>>also i have a working fenicing-device (apc-powerswitch).
>>>I think I have found a design-error in the CS4.
>>>
>>>If you boot up one node(1) and this node(1) cant reach the other node(2)
>>>node(1) trys to fence node(2).
>>>This is fine...
>>>but if the fencing fails, node(1)
>>>starts his services via rgmanager without to be shure that the other
>>>node is not working.
>>>volia and you have a splitt-brain problem.
>>
>>>any solution for this?
>>
>>rgmanager should not start services unless the node is quorate.  The
>>cluster should not ever become quorate if fencing fails.  Sounds like a
>>bug.

> Wait, fencing fails or fenced fails to start?
fenced start but fencing fails
simple way to reproduce it:

just put the node(1) in a other vlan, so that he can not reach his fence
device or the other node(2)
then reset node(1) do not reboot him...



From thorsten.henrici at gfd.de  Wed May 24 09:43:05 2006
From: thorsten.henrici at gfd.de (thorsten.henrici at gfd.de)
Date: Wed, 24 May 2006 11:43:05 +0200
Subject: [Linux-cluster] Kernel Panic (cman) after reboot
Message-ID: <OFACA5FF2C.38887248-ONC1257178.0034BAB3-C1257178.0035621C@obi.de>


Patrick Caulfield wrote:

> This is in bugzilla:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777
> a fix is in CVS.

Thanks for the prompt reply. Will this fix be available via RHN in the
near future. We plan to go productive with the cluster in about two weeks,
so it would be good to have a solution until then. If I used the CVS code,
would it get support for this (I have two valid liceneses for Red Hat
Cluster Suite).

Mit freundlichen Gr??en / Kind Regards

Thorsten Henrici


--
IMPORTANT NOTICE:
This email is confidential, may be legally privileged, and is for the
intended recipient only. Access, disclosure, copying, distribution, or
reliance on any of it by anyone else is prohibited and may be a criminal
offence. Please delete if obtained in error and email confirmation to the sender.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060524/14ab00b9/attachment.htm>

From saju8 at rediffmail.com  Wed May 24 09:26:29 2006
From: saju8 at rediffmail.com (saju  john)
Date: 24 May 2006 09:26:29 -0000
Subject: [Linux-cluster] Q&#1614;uroum loss restart cluster node
Message-ID: <20060524092629.11552.qmail@webmail30.rediffmail.com>

   
Dear All,

I would like to know whether in Redhat AS3 U3 a loss of quroum(connection loss or currupt) will restart the server(i mean node) itself.
Kernel : 2.4.21-20.Elsmp
Clumanager : clumanager-1.2.16-1

Could anyone please confirm this.

Thank You,
Saju john

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060524/6554974c/attachment.htm>

From pcaulfie at redhat.com  Wed May 24 10:06:17 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 24 May 2006 11:06:17 +0100
Subject: [Linux-cluster] Kernel Panic (cman) after reboot
In-Reply-To: <OFACA5FF2C.38887248-ONC1257178.0034BAB3-C1257178.0035621C@obi.de>
References: <OFACA5FF2C.38887248-ONC1257178.0034BAB3-C1257178.0035621C@obi.de>
Message-ID: <44743019.7020004@redhat.com>

thorsten.henrici at gfd.de wrote:
> 
> /Patrick Caulfield/ wrote:
> 
>> This is in bugzilla:
>> _https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777_
>> a fix is in CVS.
> 
> Thanks for the prompt reply. Will this fix be available via RHN in the
> near future. We plan to go productive with the cluster in about two
> weeks, so it would be good to have a solution until then. If I used the
> CVS code, would it get support for this (I have two valid liceneses for
> Red Hat Cluster Suite).
> 

I believe a supported fix is in the works. I don't have a date for release
though, sorry.

-- 

patrick



From troels at arvin.dk  Wed May 24 11:40:58 2006
From: troels at arvin.dk (Troels Arvin)
Date: Wed, 24 May 2006 13:40:58 +0200
Subject: [Linux-cluster] Please update cman-kernel and dlm-kernel when the
	main kernel is updated
Message-ID: <pan.2006.05.24.11.40.57.187000@arvin.dk>

I just ran an up2date on some servers in a RHEL 4 ES cluster, and the
kernel was updated. Now, I tried rebooting one of the nodes in the
cluster, and at least one of the cluster management software daemons
didn't properly start, seemingly due to kernel module mismatches.
Reverting to the older kernel made the cluster node join the cluster again.

This is not impressive; it's actually very disappointing. Red Hat, please
coordinate things.

By the way: Why are special kernel modules even needed by the cluster
software if GFS isn't used?

-- 
Greetings from Troels Arvin




From cjk at techma.com  Wed May 24 11:58:59 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 24 May 2006 07:58:59 -0400
Subject: [Linux-cluster] Please update cman-kernel and dlm-kernel when
	themain kernel is updated
In-Reply-To: <pan.2006.05.24.11.40.57.187000@arvin.dk>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EA1@tmaemail.techma.com>

Whoa....   

Before you go slapping redhat around I think you should start checking what
updates are
being done and why. If you are running a cluster, it's your responsebility to
make sure
any updates are first "needed" and second, are going to mesh with your
config. 

To answer your question about clustering, the newer clusterring and GFS
packages use the
same underlying subsystems. It's that simple. 

It's done that way to ensure a constant view of "clustered" parts, get
consistent events
etc. Then RHEL 3 way of doing clusters is no more and with good reason. If
they subsystems
don't talk to each other, they tend to step on each other.

That all said, it would be nice if GFS/Cluster updates did come out at or
around the same 
time as kernel updates. I do not use up2date and cannot in my situation so
manual downloads
do it for me. That's ok tho cuz I like to know whats happening on my
clusters.

Finally, this is not the place to complain about such things. If you have a
beef with Redhat
then you should be calling under your support contract and complain there
where an official 
chain of events occurrs...  Your more likely to get your desired result.
Paying customers
get the ear. Anyone (ie non-paying cluster users) can complain here so it's
of much less 
value I think. Your mileage may vary.


Just my 01 cents...


Cheers


Corey

NOTE: My views are MY OWN, NOT REDHAT's  (They may be more kind....)

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Troels Arvin
Sent: Wednesday, May 24, 2006 7:41 AM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Please update cman-kernel and dlm-kernel when
themain kernel is updated

I just ran an up2date on some servers in a RHEL 4 ES cluster, and the kernel
was updated. Now, I tried rebooting one of the nodes in the cluster, and at
least one of the cluster management software daemons didn't properly start,
seemingly due to kernel module mismatches.
Reverting to the older kernel made the cluster node join the cluster again.

This is not impressive; it's actually very disappointing. Red Hat, please
coordinate things.

By the way: Why are special kernel modules even needed by the cluster
software if GFS isn't used?

--
Greetings from Troels Arvin


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From troels at arvin.dk  Wed May 24 12:16:50 2006
From: troels at arvin.dk (Troels Arvin)
Date: Wed, 24 May 2006 14:16:50 +0200
Subject: [Linux-cluster] RE: Please update cman-kernel and dlm-kernel
	when	themain kernel is updated
References: <pan.2006.05.24.11.40.57.187000@arvin.dk>
	<FF2CE0D593AEE34B955FEC77BD5AFBE0079EA1@tmaemail.techma.com>
Message-ID: <pan.2006.05.24.12.16.49.219000@arvin.dk>

On Wed, 24 May 2006 07:58:59 -0400, Kovacs, Corey J. wrote:
> Before you go slapping redhat around I think you should start checking what
> updates are
> being done and why. If you are running a cluster, it's your responsebility to
> make sure
> any updates are first "needed" and second, are going to mesh with your
> config. 

And so I did, and I didn't have production down-time.

About up2date: Using up2date is the official way of doing things,
according to all Red Hat documentation I've read.

About using a mailing list for complaints: Yes, I might post a
complaint/trouble report through the formal (contract-based) support.
Still, I find it highly valuable for other users to be aware of this
issue. And if a Red Hat clustering mailing list cannot be used to post
somewhat critical messages about Red Hat clustering, then I don't know
where it should be.

-- 
Greetings from Troels Arvin




From cjk at techma.com  Wed May 24 12:36:19 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 24 May 2006 08:36:19 -0400
Subject: [Linux-cluster] RE: Please update cman-kernel and
	dlm-kernelwhen	themain kernel is updated
In-Reply-To: <pan.2006.05.24.12.16.49.219000@arvin.dk>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EA2@tmaemail.techma.com>

Alright, this is what happens when I have too muich sugur in the morning.

However, I am curious about your first comment about not having production 
down time. To me, that reads as "Was not authorized to bring the system down"
to which I'd reply, why was it done?


On up2date being the official way to update systems? You're kidding right?
Maybe I'm old-school but I don't want ANY automated process updating my
systems.
At best, if I could ( I know it can be, just not for me), I'd use it to tell
me 
if updates were available and download them for me. I'd never have them
update 
my systems  automatically without testing them in MY environment, period.
True 
enough that it's a good tool and generally works well (I think). However, in 
the end, keeping the machines running is MY responsebility, NOT RedHat's.

And as for using mailing lists, your spot on, almost. The issue is not that a
mailing
list is being used to lodge complaints, rather the issue is that the correct
mailing
list is NOT being used. If there is an issue with the up2date process, then
you need
to complain on the "up2date" mailing list or wherever all things up2date are
discussed.


Again, this is not to start a war, just point out some things. I don't know
under what
conditions you or your systems were in so I cannot and should not comment on
your
individual experience. I am sure the circumstances dictated what you did and
you were
not happy with the results. I certainly understand that and it's unfortunate.
I also
agree that it would be nice if the updates coincided with each other. I've
had to wait
up to three weeks before with people breathing down my neck about RHEL3
update 7 and 
the corrosponding GFS update. I know that that's like believe me.

Anyway, lets all get back to discussing clusters.


Cheers


Corey



-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Troels Arvin
Sent: Wednesday, May 24, 2006 8:17 AM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] RE: Please update cman-kernel and dlm-kernelwhen
themain kernel is updated

On Wed, 24 May 2006 07:58:59 -0400, Kovacs, Corey J. wrote:
> Before you go slapping redhat around I think you should start checking 
> what updates are being done and why. If you are running a cluster, 
> it's your responsebility to make sure any updates are first "needed" 
> and second, are going to mesh with your config.

And so I did, and I didn't have production down-time.

About up2date: Using up2date is the official way of doing things, according
to all Red Hat documentation I've read.

About using a mailing list for complaints: Yes, I might post a
complaint/trouble report through the formal (contract-based) support.
Still, I find it highly valuable for other users to be aware of this issue.
And if a Red Hat clustering mailing list cannot be used to post somewhat
critical messages about Red Hat clustering, then I don't know where it should
be.

--
Greetings from Troels Arvin


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From lhh at redhat.com  Wed May 24 13:18:30 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 24 May 2006 09:18:30 -0400
Subject: [Linux-cluster] [Fwd: Configure FC2-133 with DS4100 On Linux RHEL
	4.2]
Message-ID: <1148476710.20766.51.camel@ayanami.boston.redhat.com>

*bounce*

I think this was meant to come here.  I do not have anything to add; I
have never used this particular piece of hardware.

-- Lon
-------------- next part --------------
An embedded message was scrubbed...
From: Milis <milis at ogs-id.com>
Subject: Configure FC2-133 with DS4100 On Linux RHEL 4.2
Date: Wed, 24 May 2006 10:14:34 +0700
Size: 3312
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060524/ef804821/attachment.eml>

From jim at ond.vein.hu  Wed May 24 13:49:38 2006
From: jim at ond.vein.hu (Gyimesi Attila)
Date: Wed, 24 May 2006 15:49:38 +0200
Subject: [Linux-cluster] two nodes active-active cluster
Message-ID: <44746472.50804@ond.vein.hu>

Hello,

I'd like to set up an active-active cluster.
I have two servers and one external storage (Brocade SAN, fibre channel).
The first thing I want is to use the same partition on the storage form both
servers at the same time.

Is it possible?
How?

Regards,
       Attila



From cjk at techma.com  Wed May 24 14:40:42 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 24 May 2006 10:40:42 -0400
Subject: [Linux-cluster] Configuring NFS on Red Hat Enterprise Linux 4AS /
	Cluster Suite / GFS 6.1
In-Reply-To: <1148423233.6545.43.camel@technetium.msp.redhat.com>
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EA3@tmaemail.techma.com>

Bob, in looking through your document, first I'd like to say "Thanks". Job
well done.


Second, I have a question about how RHCS (either version) handles the rmtab.

At a glance it appears that the cluster simply keeps a copy of the rmtab for 
a given NFS instance sync'd wioth a copy on the exported filesystem itself. 

For a single point of entry for NFS failover, it's a no-brainer and simply
"works".

However, for a multi server config like you describe where all servers have a
piece of the pie so to speak, how do does the cluster keep from killing the
existing
rmtab file on the failover target. 

That is if I have serverA and serverB, both running clusterred NFS services,
and serverA
buys the farm, it seems that when the rmtab from serverA gets sync'd over to
serverB, then
serverB's rmtab gets squashed.

That's a very simplistic view of things but I've not taken the time to go
code diving
to ge the answer of "How, at a low level, is the rmtab handled"...


Thanks again for a good read.


Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert S Peterson
Sent: Tuesday, May 23, 2006 6:27 PM
To: linux clustering
Subject: Re: [Linux-cluster] Configuring NFS on Red Hat Enterprise Linux 4AS
/ Cluster Suite / GFS 6.1

On Tue, 2006-05-23 at 16:53 -0400, Sutton, Harry (MSE) wrote:
> This seems like a fairly straightforward thing to do, but I'm having a 
> heck of a time accomplishing it.

Hi Harry,

Since I've got some experience in this matter, earlier in the year, I tried
to write an "NFS/GFS Cookbook" that was a nuts-and-bolts
("example-driven") approach to configuring NFS to work with Cluster Suite and
GFS exactly as you're describing.

So far, I haven't had much time to refine it and make it into an officially
sanctioned "Red Hat Document."  It's just a draft.  However, because I'm
seeing more questions about this lately, I've decided to make it available on
our cvs source directory here:

http://sources.redhat.com/cluster/doc/nfscookbook.pdf 

I hope this helps.
Just keep in mind: This is a draft.  There may be mistakes.
If you let me know what they are, I can fix the doc though...
Hope this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From Donald.Deasy at sli-institute.ac.uk  Wed May 24 15:04:05 2006
From: Donald.Deasy at sli-institute.ac.uk (Donald Deasy)
Date: Wed, 24 May 2006 16:04:05 +0100
Subject: [Linux-cluster] LVM and GFS How to recover lost volume group
	information
Message-ID: <7D7A68F7F09DAE40AE47E55F7F601D8B761BF5@SLISERVER21.sli-institute.ac.uk>

Hi Everyone

I have a server connected to an EMC CX300 SAN and yesterday I had a
glitch on the server and lost all kernel devices.
After running depmod and rebooting all the devices are back.

Unfortunately LVM is only listing the server internal hard disks volume
group VolGroup00.

The SAN volume groups have disappeared and all the SAN LUN disks are
showing up as "Uninitialized Entities".

I'm assuming that the SAN LUNs are okay and I just need to recreate the
LVM Metadata (without destroying the data on the SAN LUNs).

Question 1
Is it possible to do this ?

OS Redhat Enterprise 4 2.6.9-11.Elsmp

Configured Using
EMCpower.LINUX-4.4.0-343
Qla2xxx-v8.00.03b-1

Linux Cluster Suite 6.0
ccs-1.0.2-0
cman-1.0.2-0

LVM: 
Disk Entity: /dev/sdb1
Size: 536.48GB
Partition Type: Linux (0x83)
File System: No Filesystem

I used system-config-lvm to create logical volumes from file storage
LUNS presented from an EMC CX300 SAN.

I then used the following command to create the volumes

gfs_mkfs -p lock_dlm -t isli_cluster:sligfsa -j 4
/dev/VolGroupsli1/LogVolsli1disk1

The volumes have been running without problem mounted through Cluster
resources
GFS Resource
Name: slidska
Mount Point: /mnt/slidska
Device: /dev/VolGroupsli1/LogVolsli1disk1
Options acl

Or by hand
mount -t gfs -o acl /dev/VolGroupsli1/LogVolsli1disk1 /mnt/slidska


 
Donald Deasy
Systems Manager
Institute for System Level Integration
Tel: +44 1506 469311 (Direct)

ISLI provides postgraduate education, professional training and research
in system level integration incorporating cross over technologies such
as hardware, embedded software, MNT/MEMS.  Details of our activities can
be found at: www.sli-institute.ac.uk <http://www.sli-institute.ac.uk/> 

 



From rpeterso at redhat.com  Wed May 24 16:43:35 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Wed, 24 May 2006 11:43:35 -0500
Subject: [Linux-cluster] Configuring NFS on Red Hat Enterprise Linux
	4AS / Cluster Suite / GFS 6.1
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EA3@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079EA3@tmaemail.techma.com>
Message-ID: <1148489015.6545.76.camel@technetium.msp.redhat.com>

On Wed, 2006-05-24 at 10:40 -0400, Kovacs, Corey J. wrote:
> Bob, in looking through your document, first I'd like to say "Thanks". Job
> well done.
> 
> 
> Second, I have a question about how RHCS (either version) handles the rmtab.
> 
> At a glance it appears that the cluster simply keeps a copy of the rmtab for 
> a given NFS instance sync'd wioth a copy on the exported filesystem itself. 
> 
> For a single point of entry for NFS failover, it's a no-brainer and simply
> "works".
> 
> However, for a multi server config like you describe where all servers have a
> piece of the pie so to speak, how do does the cluster keep from killing the
> existing
> rmtab file on the failover target. 
> 
> That is if I have serverA and serverB, both running clusterred NFS services,
> and serverA
> buys the farm, it seems that when the rmtab from serverA gets sync'd over to
> serverB, then
> serverB's rmtab gets squashed.
> 
> That's a very simplistic view of things but I've not taken the time to go
> code diving
> to ge the answer of "How, at a low level, is the rmtab handled"...
> 
> 
> Thanks again for a good read.
> 
> 
> Corey

Hi Corey,

Your question hinges on whether your NFS servers are set up with
virtual IPs, or real "NFS services" managed by rgmanager.

In the case of virtual IPs, as far as I know, nothing is managed except
the virtual IP itself, and therefore all the management of fstab and
other nfs-related files like rmtab are all on the user's shoulders.

With rgmanager managing an nfs service, however, things are better.
There the rmtab stuff is all handled by an rgmanager daemon known as
clurmtabd whose job is to keep the rmtab information in sync
between nodes configured for the nfs service.  It tries to figure out
clustered resources and keep them separate from non-clustered
resources.  BTW, the daemon is in user-space, so the kernel knows
nothing about all this.

I'm afraid I don't know much more about the clurmtabd daemon than that
because I haven't studied that particular area of code much.  I'm more
of a gfs guy than an rgmanager guy.  But hopefully this helps.

Regards,

Bob Peterson
Red Hat Cluster Suite




From mbrookov at mines.edu  Wed May 24 17:02:30 2006
From: mbrookov at mines.edu (Matthew B. Brookover)
Date: Wed, 24 May 2006 11:02:30 -0600
Subject: [Linux-cluster] Upgrade from Enterprise 3 to Enterprise 4
Message-ID: <1148490150.3785.24.camel@merlin.Mines.EDU>

For various reasons, I am getting ready to upgrade from Enterprise 3 and
GFS 6.0 to Enterprise 4 and GFS 6.1.

The upgrade section of the GFS 6.1 administrators guide has instructions
to convert pools to LVM2 and says to see the Configuring and Managing a
Cluster guide to convert a GFS file system from GULM to DLM.  I have not
carefully read the Configuring and Managing a Cluster Guide, but a quick
scan of the index and table of contents did not reveal the instructions.

So, my first question, how do I switch an existing file system in GFS
6.1 from GULM to DLM?

The second question, what kind of experiences have people had with an
upgrade of a cluster from Enterprise 3 AS update 6 to Enterprise 4 AS
update 3?

Thank you

Matt
mbrookov at mines.edu



From rpeterso at redhat.com  Wed May 24 18:18:13 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Wed, 24 May 2006 13:18:13 -0500
Subject: [Linux-cluster] Upgrade from Enterprise 3 to Enterprise 4
In-Reply-To: <1148490150.3785.24.camel@merlin.Mines.EDU>
References: <1148490150.3785.24.camel@merlin.Mines.EDU>
Message-ID: <1148494693.3344.7.camel@technetium.msp.redhat.com>

Hi Matthew,

On Wed, 2006-05-24 at 11:02 -0600, Matthew B. Brookover wrote:
> For various reasons, I am getting ready to upgrade from Enterprise 3 and
> GFS 6.0 to Enterprise 4 and GFS 6.1.
> 
> The upgrade section of the GFS 6.1 administrators guide has instructions
> to convert pools to LVM2 and says to see the Configuring and Managing a
> Cluster guide to convert a GFS file system from GULM to DLM.  I have not
> carefully read the Configuring and Managing a Cluster Guide, but a quick
> scan of the index and table of contents did not reveal the instructions.
> 
> So, my first question, how do I switch an existing file system in GFS
> 6.1 from GULM to DLM?

The GFS filesystem format is identical from RHEL3 to RHEL4.
If you want to switch from GULM to DLM (a wise move), do this:

gfs_tool sb <your device> proto lock_dlm

For example:

[root at trin-12 ~]# gfs_tool sb /dev/bobs_vg/lvol0 proto lock_dlm

> The second question, what kind of experiences have people had with an
> upgrade of a cluster from Enterprise 3 AS update 6 to Enterprise 4 AS
> update 3?

I'm sure other people would be willing to share their stories.

> Thank you
> 
> Matt
> mbrookov at mines.edu

I also recommend making a good backup or two of your fs before starting.

Regards,

Bob Peterson
Red Hat Cluster Suite




From Donald.Deasy at sli-institute.ac.uk  Wed May 24 18:28:41 2006
From: Donald.Deasy at sli-institute.ac.uk (Donald Deasy)
Date: Wed, 24 May 2006 19:28:41 +0100
Subject: [Linux-cluster] LVM and GFS How to recover lost volume
	groupinformation
In-Reply-To: <7D7A68F7F09DAE40AE47E55F7F601D8B761BF5@SLISERVER21.sli-institute.ac.uk>
Message-ID: <7D7A68F7F09DAE40AE47E55F7F601D8B761BF6@SLISERVER21.sli-institute.ac.uk>

Hi Everyone

Cancel the query.

I've ran out of time so I've just recreated the volumes and restoring
from backups. 

It's starting to look like "operator error". 

I suspect that whilst upgrading to kernel 2.6.9-34 on another machine in
the cluster I removed the LUNs from LVM. I had a bit of bother getting
the lvm working and eventually recompiled from src the lvm2-cluster
(version 2.02.01-1.2.RHEL4).  


Donald


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Donald Deasy
Sent: 24 May 2006 16:04
To: linux-cluster at redhat.com
Subject: [Linux-cluster] LVM and GFS How to recover lost volume
groupinformation

Hi Everyone

I have a server connected to an EMC CX300 SAN and yesterday I had a
glitch on the server and lost all kernel devices.
After running depmod and rebooting all the devices are back.

Unfortunately LVM is only listing the server internal hard disks volume
group VolGroup00.

The SAN volume groups have disappeared and all the SAN LUN disks are
showing up as "Uninitialized Entities".

I'm assuming that the SAN LUNs are okay and I just need to recreate the
LVM Metadata (without destroying the data on the SAN LUNs).

Question 1
Is it possible to do this ?

OS Redhat Enterprise 4 2.6.9-11.Elsmp

Configured Using
EMCpower.LINUX-4.4.0-343
Qla2xxx-v8.00.03b-1

Linux Cluster Suite 6.0
ccs-1.0.2-0
cman-1.0.2-0

LVM: 
Disk Entity: /dev/sdb1
Size: 536.48GB
Partition Type: Linux (0x83)
File System: No Filesystem

I used system-config-lvm to create logical volumes from file storage
LUNS presented from an EMC CX300 SAN.

I then used the following command to create the volumes

gfs_mkfs -p lock_dlm -t isli_cluster:sligfsa -j 4
/dev/VolGroupsli1/LogVolsli1disk1

The volumes have been running without problem mounted through Cluster
resources GFS Resource
Name: slidska
Mount Point: /mnt/slidska
Device: /dev/VolGroupsli1/LogVolsli1disk1 Options acl

Or by hand
mount -t gfs -o acl /dev/VolGroupsli1/LogVolsli1disk1 /mnt/slidska


 
Donald Deasy
Systems Manager
Institute for System Level Integration
Tel: +44 1506 469311 (Direct)

ISLI provides postgraduate education, professional training and research
in system level integration incorporating cross over technologies such
as hardware, embedded software, MNT/MEMS.  Details of our activities can
be found at: www.sli-institute.ac.uk <http://www.sli-institute.ac.uk/> 

 

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From rajkum2002 at rediffmail.com  Wed May 24 19:07:12 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 24 May 2006 19:07:12 -0000
Subject: [Linux-cluster] GFS6.0 installation RHEL4?
Message-ID: <20060524190712.566.qmail@webmail49.rediffmail.com>

Hello,

We are considering GFS 6.0 (without cluster suite) installation on RHEL4 and RHEL3 servers. 

My questions are:

1. Can GFS6.0 run on RHEL4? 
2. Can there be a mix of RHEL3 and RHEL4 nodes running GFS6.0 accessing the same cluster filesystem?

Thanks in advance for your help,
Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060524/50684f38/attachment.htm>

From mbrookov at mines.edu  Wed May 24 21:33:22 2006
From: mbrookov at mines.edu (Matthew B. Brookover)
Date: Wed, 24 May 2006 15:33:22 -0600
Subject: [Linux-cluster] Upgrade from Enterprise 3 to Enterprise 4
In-Reply-To: <1148494693.3344.7.camel@technetium.msp.redhat.com>
References: <1148490150.3785.24.camel@merlin.Mines.EDU>
	<1148494693.3344.7.camel@technetium.msp.redhat.com>
Message-ID: <1148506402.3785.27.camel@merlin.Mines.EDU>

Thank you Bob, this looks good.

Matt
 
On Wed, 2006-05-24 at 13:18 -0500, Robert S Peterson wrote:

> Hi Matthew,
> 
> On Wed, 2006-05-24 at 11:02 -0600, Matthew B. Brookover wrote:
> > For various reasons, I am getting ready to upgrade from Enterprise 3 and
> > GFS 6.0 to Enterprise 4 and GFS 6.1.
> > 
> > The upgrade section of the GFS 6.1 administrators guide has instructions
> > to convert pools to LVM2 and says to see the Configuring and Managing a
> > Cluster guide to convert a GFS file system from GULM to DLM.  I have not
> > carefully read the Configuring and Managing a Cluster Guide, but a quick
> > scan of the index and table of contents did not reveal the instructions.
> > 
> > So, my first question, how do I switch an existing file system in GFS
> > 6.1 from GULM to DLM?
> 
> The GFS filesystem format is identical from RHEL3 to RHEL4.
> If you want to switch from GULM to DLM (a wise move), do this:
> 
> gfs_tool sb <your device> proto lock_dlm
> 
> For example:
> 
> [root at trin-12 ~]# gfs_tool sb /dev/bobs_vg/lvol0 proto lock_dlm
> 
> > The second question, what kind of experiences have people had with an
> > upgrade of a cluster from Enterprise 3 AS update 6 to Enterprise 4 AS
> > update 3?
> 
> I'm sure other people would be willing to share their stories.
> 
> > Thank you
> > 
> > Matt
> > mbrookov at mines.edu
> 
> I also recommend making a good backup or two of your fs before starting.
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060524/a7241a3c/attachment.htm>

From sgrieve at star-telegram.com  Thu May 25 03:03:12 2006
From: sgrieve at star-telegram.com (Grieve, Shane)
Date: Wed, 24 May 2006 22:03:12 -0500
Subject: [Linux-cluster] Kernel Panic DLM gfs accesing files
Message-ID: <C319D2BCA4F21E45B3403D4C1707014B02034805@FWSTEX03.fwst.fwstroot.star-telegram.com>

Hello,

        I am having problems accessing certain files shared to MAC

clients though Netatalk 2.0.3 and PC though though samba. When you try

and access the files usually text files though the console using vi etc

the session hangs. RedHat Enterprise 4 kernel 2.6.9-5 2 x Dell Poweredge

2850 HDS 9570 and fence_mcdata.

 

Thanks

Shane    

 

May 23 08:51:10 maccluster1 kernel: ------------[ cut here ]------------

May 23 08:51:10 maccluster1 kernel: kernel BUG at

/root/cluster/gfs-kernel/src/dlm/plock.c:797!

May 23 08:51:10 maccluster1 kernel: invalid operand: 0000 [#7]

May 23 08:51:10 maccluster1 kernel: SMP 

May 23 08:51:10 maccluster1 kernel: Modules linked in: nfs appletalk(U)

nfsd exportfs lockd parport_pc lp parport autofs4 i2c_dev i2c_core

lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) sunrpc md5 ipv6 button

battery ac uhci_hcd ehci_hcd e1000(U) floppy qla2300 qla2xxx

scsi_transport_fc sg dm_snapshot dm_zero dm_mirror ext3 jbd(U) dm_mod

megaraid_mbox(U) megaraid_mm(U) sd_mod scsi_mod

May 23 08:51:10 maccluster1 kernel: CPU:    1

May 23 08:51:10 maccluster1 kernel: EIP:    0060:[<f89b0480>]

Tainted: GF     VLI

May 23 08:51:10 maccluster1 kernel: EFLAGS: 00010246   (2.6.9-5.ELsmp) 

May 23 08:51:10 maccluster1 kernel: EIP is at plock_internal+0x1a2/0x29c

[lock_dlm]

May 23 08:51:10 maccluster1 kernel: eax: 00000001   ebx: f5d6cfc0   ecx:

ccff0e50   edx: f89b29d3

May 23 08:51:10 maccluster1 kernel: esi: 00008000   edi: 00000000   ebp:

de938c80   esp: ccff0e4c

May 23 08:51:10 maccluster1 kernel: ds: 007b   es: 007b   ss: 0068

May 23 08:51:10 maccluster1 kernel: Process afpd (pid: 18192,

threadinfo=ccff0000 task=e10cf330)

May 23 08:51:10 maccluster1 kernel: Stack: f89b29d3 f89b296d 0000031d

f89b2944 f89b2a6f 0a044081 00000000 f5dad740 

May 23 08:51:10 maccluster1 kernel:        de938cb0 00000000 e1009280

f5dad740 f5dad740 00000000 de938c80 00008c3c 

May 23 08:51:10 maccluster1 kernel:        00000000 f89b09c8 00000000

00000c3d 00000000 00008c3c 00000000 00000c3d 

May 23 08:51:10 maccluster1 kernel: Call Trace:

May 23 08:51:10 maccluster1 kernel:  [<f89b09c8>]

lm_dlm_plock+0x1a6/0x24a [lock_dlm]

May 23 08:51:10 maccluster1 kernel:  [<f8f27774>] gfs_lm_plock+0x34/0x44

[gfs]

May 23 08:51:10 maccluster1 kernel:  [<f8f32b9c>] gfs_lock+0xe4/0xf2

[gfs]

May 23 08:51:10 maccluster1 kernel:  [<f8f32ab8>] gfs_lock+0x0/0xf2

[gfs]

May 23 08:51:10 maccluster1 kernel:  [<c01686c5>]

fcntl_setlk64+0x156/0x291

May 23 08:51:10 maccluster1 kernel:  [<c0127d49>]

__mod_timer+0x101/0x10b

May 23 08:51:10 maccluster1 kernel:  [<c0124257>]

do_setitimer+0x16c/0x1c5

May 23 08:51:10 maccluster1 kernel:  [<c01562c4>] fget+0x3b/0x42

May 23 08:51:10 maccluster1 kernel:  [<c016481f>] sys_fcntl64+0x5a/0x7d

May 23 08:51:10 maccluster1 kernel:  [<c02c62a3>] syscall_call+0x7/0xb

May 23 08:51:10 maccluster1 kernel:  [<c02c007b>]

unix_detach_fds+0x2e/0x31

May 23 08:51:10 maccluster1 kernel: Code: ff ff a1 a0 48 31 c0 50 68 6f

2a 9b f8 68 44 29 9b f8 68 1d 03 00 00 68 6d 29 9b f8 e8 90 07 77 c7 68

d3 29 9b f8 e8 86 07 77 c7 <0f> 0b 1d 03 44 29 9b f8 68 f1 29 9b f8 e8

2d ff 76 c7 89 5c 24

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060524/1c480e49/attachment.htm>

From ahmadz at amti.com.ph  Thu May 25 03:59:00 2006
From: ahmadz at amti.com.ph (Ahmadz S. Taneo)
Date: Thu, 25 May 2006 11:59:00 +0800
Subject: [Linux-cluster] HELP!! HOW TO USE BMC OF DELL PE2850 AS FENCE
	DEVICE FOR REDHAT CLUSTER MANAGEMENT SUITE 3.0
In-Reply-To: <024801c67faf$0912cff0$19f2a8c0@mstaneo>
Message-ID: <024d01c67faf$8ee192b0$19f2a8c0@mstaneo>

Hello Guys,

Good day!

I need to use BMC of PowerEdge 2850 as a fence device for Redhat Linux
Cluster Manager Suite 3.0 on RHEL ES 3.0 Update 7. As far as I know BMC of
DELL PowerEdge 2850 is an IPMI over LAN embedded on motherboard network
interface card. I was trying to configure fence device on the cluster
manager tool but the parameter requires a "port" number which I couldn't
provide. I have tried accessing BMC Utility on boot up but couldn't see any
port number that I can use. I was able to configure on the Utiliy the
ipaddress of the BMC abd turned On the VLANwith the default ID 1. Is this
correct?

I hope there is anyone here who can help me here.

Please advice and thank you in advance.

Ahmadz Taneo

"Think God. Think Good!!."



From rpeterso at redhat.com  Tue May 23 14:44:41 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Tue, 23 May 2006 09:44:41 -0500
Subject: [Linux-cluster] (no subject)
In-Reply-To: <C319D2BCA4F21E45B3403D4C1707014B020347FC@FWSTEX03.fwst.fwstroot.star-telegram.com>
References: <C319D2BCA4F21E45B3403D4C1707014B020347FC@FWSTEX03.fwst.fwstroot.star-telegram.com>
Message-ID: <1148395481.18630.29.camel@technetium.msp.redhat.com>

On Tue, 2006-05-23 at 09:19 -0500, Grieve, Shane wrote:
> Hello,
> 	I am having problems accessing certain files shared to MAC
> clients though Netatalk 2.0.3 and PC though though samba. When you try
> and access the files usually text files though the console using vi etc
> the session hangs. RedHat Enterprise 4 kernel 2.6.9-5 2 x Dell Poweredge
> 2850 HDS 9570 and fence_mcdata.
> 
> Thanks
> Shane    

Hi Shane,

That's a pretty old kernel.  RHEL4 is now running 2.6.9-34 kernels and
quite a few gfs bugs have been fixed since 2.6.9-5.  Any way you can
get to RHEL4-U3 and try it again?

Regards,

Bob Peterson
Red Hat Cluster Suite




From bojan at rexursive.com  Fri May 26 03:05:03 2006
From: bojan at rexursive.com (Bojan Smojver)
Date: Fri, 26 May 2006 13:05:03 +1000
Subject: [Linux-cluster] RHEL4/GFS updates
Message-ID: <20060526130503.2me6ld81k4o088g8@www.rexursive.com>

A few warnings for folks running RHEL4/GFS on HP BL25p blades:

- update your firmware to the latest version on BL25p blades, or  
kernel 2.6.9-34.0.1smp (x86_64) will hang on boot (this is GFS/cluster  
unrelated)

- if you update one member of the cluster with the new fenced and  
reboot it (with other members being on with old fenced), once the new  
member comes back, others will go all weird and drop out of the  
cluster, followed by a panic on OOPS if you try to reboot them; the  
cluster won't work that way at all; best do them all at once (i.e.  
schedule downtime)

I found out the hard way... (i.e. nothing in fenced release notes about this).

-- 
Bojan



From rajkum2002 at rediffmail.com  Fri May 26 17:37:47 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 26 May 2006 17:37:47 -0000
Subject: [Linux-cluster] GFS6.0 installation RHEL4?
Message-ID: <20060526173747.16363.qmail@webmail67.rediffmail.com>

Hello,

Can someone please tell me if I could install GFS 6.0 on RHEL4. When I tried to install the latest GFS src rpm it complained about the kernel source package 2.4.x. I modified the gfs-build.spec to include the current kernel name 2.6.x. However, rpmbuild complained that it couldn't find source package and that failed. 

Can someone please tell if GFS 6.0 can be installed on RHEL 4? If yes, please provide information about the compilation procedure. I have compiled GFS several times on RHEL 3 but not on RHEL 4. So info on changes that must be made to compile GFS 6.0 will be very helpful.

Thanks very much. Please respond!
Raj


On Thu, 25 May 2006 Raj  Kumar wrote :
>Hello,
>
>We are considering GFS 6.0 (without cluster suite) installation on RHEL4 and RHEL3 servers.
>
>My questions are:
>
>1. Can GFS6.0 run on RHEL4?
>2. Can there be a mix of RHEL3 and RHEL4 nodes running GFS6.0 accessing the same cluster filesystem?
>
>Thanks in advance for your help,
>Raj
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060526/1e6fb24c/attachment.htm>

From rpeterso at redhat.com  Fri May 26 18:13:42 2006
From: rpeterso at redhat.com (Robert S Peterson)
Date: Fri, 26 May 2006 13:13:42 -0500
Subject: [Linux-cluster] GFS6.0 installation RHEL4?
In-Reply-To: <20060526173747.16363.qmail@webmail67.rediffmail.com>
References: <20060526173747.16363.qmail@webmail67.rediffmail.com>
Message-ID: <1148667222.5742.2.camel@technetium.msp.redhat.com>

On Fri, 2006-05-26 at 17:37 +0000, Raj Kumar wrote:
> Hello,
> 
> Can someone please tell me if I could install GFS 6.0 on RHEL4. When I
> tried to install the latest GFS src rpm it complained about the kernel
> source package 2.4.x. I modified the gfs-build.spec to include the
> current kernel name 2.6.x. However, rpmbuild complained that it
> couldn't find source package and that failed. 
> 
> Can someone please tell if GFS 6.0 can be installed on RHEL 4? If yes,
> please provide information about the compilation procedure. I have
> compiled GFS several times on RHEL 3 but not on RHEL 4. So info on
> changes that must be made to compile GFS 6.0 will be very helpful.
> 
> Thanks very much. Please respond!
> Raj
Hi Raj,

You probably want GFS 6.1 for RHEL4.  And I recommend going to RHEL4-U3
or higher too.

Regards,

Bob Peterson
Red Hat Cluster Suite




From rajkum2002 at rediffmail.com  Fri May 26 19:11:39 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 26 May 2006 19:11:39 -0000
Subject: [Linux-cluster] GFS6.0 installation RHEL4?
Message-ID: <20060526191139.18641.qmail@webmail32.rediffmail.com>

Thanks Bob for replying to my post. I appreciate it.

We have a mix of RHEL 3 and RHEL 4 servers. Some of the RHEL 3 nodes cannot be upgraded to RHEL 4 and RHEL 4 nodes cannot be downgraded. 
But both RHEL 3 and RHEL 4 servers need to access the GFS filesystem. 
I read in the documentation that GFS6.1 needs RHEL4 or later. So GFS6.1 cannot be installed on RHEL 3 nodes. That's why I am trying to get GFS6.0 working on RHEL 4. 

Can GFS 6.0 be installed on RHEL 4? If so, can you please provide some pointers. It's very important for us to get this configuration working soon. Please help. 

Can RHEL 3 nodes w/ GFS6.0 and rHEL 4 nodes w/GFS6.1 interoperate... I mean can they access the same clustered file system. Probably not. The configuration is much different. 

Thanks,
Raj

On Fri, 26 May 2006 Robert S Peterson wrote :
>On Fri, 2006-05-26 at 17:37 +0000, Raj Kumar wrote:
> > Hello,
> >
> > Can someone please tell me if I could install GFS 6.0 on RHEL4. When I
> > tried to install the latest GFS src rpm it complained about the kernel
> > source package 2.4.x. I modified the gfs-build.spec to include the
> > current kernel name 2.6.x. However, rpmbuild complained that it
> > couldn't find source package and that failed.
> >
> > Can someone please tell if GFS 6.0 can be installed on RHEL 4? If yes,
> > please provide information about the compilation procedure. I have
> > compiled GFS several times on RHEL 3 but not on RHEL 4. So info on
> > changes that must be made to compile GFS 6.0 will be very helpful.
> >
> > Thanks very much. Please respond!
> > Raj
>Hi Raj,
>
>You probably want GFS 6.1 for RHEL4.  And I recommend going to RHEL4-U3
>or higher too.
>
>Regards,
>
>Bob Peterson
>Red Hat Cluster Suite
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060526/43f3bcbb/attachment.htm>

From rajkum2002 at rediffmail.com  Fri May 26 19:14:06 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 26 May 2006 19:14:06 -0000
Subject: [Linux-cluster] GFS6.0 installation RHEL4?
Message-ID: <20060526191406.2072.qmail@webmail28.rediffmail.com>

Just to clarify...

Basically I need to configure RHEL 3 and RHEL 4 nodes to access our SAN via GFS.  What GFS version  should I use?

Thanks!
Raj


On Sat, 27 May 2006 Raj  Kumar wrote :
>Thanks Bob for replying to my post. I appreciate it.
>
>We have a mix of RHEL 3 and RHEL 4 servers. Some of the RHEL 3 nodes cannot be upgraded to RHEL 4 and RHEL 4 nodes cannot be downgraded.
>But both RHEL 3 and RHEL 4 servers need to access the GFS filesystem.
>I read in the documentation that GFS6.1 needs RHEL4 or later. So GFS6.1 cannot be installed on RHEL 3 nodes. That's why I am trying to get GFS6.0 working on RHEL 4.
>
>Can GFS 6.0 be installed on RHEL 4? If so, can you please provide some pointers. It's very important for us to get this configuration working soon. Please help.
>
>Can RHEL 3 nodes w/ GFS6.0 and rHEL 4 nodes w/GFS6.1 interoperate... I mean can they access the same clustered file system. Probably not. The configuration is much different.
>
>Thanks,
>Raj
>
>On Fri, 26 May 2006 Robert S Peterson wrote :
> >On Fri, 2006-05-26 at 17:37 +0000, Raj Kumar wrote:
> > > Hello,
> > >
> > > Can someone please tell me if I could install GFS 6.0 on RHEL4. When I
> > > tried to install the latest GFS src rpm it complained about the kernel
> > > source package 2.4.x. I modified the gfs-build.spec to include the
> > > current kernel name 2.6.x. However, rpmbuild complained that it
> > > couldn't find source package and that failed.
> > >
> > > Can someone please tell if GFS 6.0 can be installed on RHEL 4? If yes,
> > > please provide information about the compilation procedure. I have
> > > compiled GFS several times on RHEL 3 but not on RHEL 4. So info on
> > > changes that must be made to compile GFS 6.0 will be very helpful.
> > >
> > > Thanks very much. Please respond!
> > > Raj
> >Hi Raj,
> >
> >You probably want GFS 6.1 for RHEL4.  And I recommend going to RHEL4-U3
> >or higher too.
> >
> >Regards,
> >
> >Bob Peterson
> >Red Hat Cluster Suite
> >
> >
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060526/d6dffeff/attachment.htm>

From jason at monsterjam.org  Fri May 26 20:20:15 2006
From: jason at monsterjam.org (Jason)
Date: Fri, 26 May 2006 16:20:15 -0400
Subject: [Linux-cluster] overwriting old cca config
Message-ID: <20060526202015.GA29753@monsterjam.org>

so Im upgrading from GFS 6.0 to 6.1 and want to run 
 pvcreate /dev/sdb1 
and I get 
 Can't initialize physical volume "/dev/sdb1" of volume group "pool_cca" without -ff
which is from the old GFS I had on this disk.. How do I correctly remove this? Im not interested 
in upgrading anything, just blowing it away and starting over.

regards,
Jason



From baro at democritos.it  Sun May 28 22:20:31 2006
From: baro at democritos.it (Moreno Baricevic)
Date: Mon, 29 May 2006 00:20:31 +0200 (CEST)
Subject: [Linux-cluster] Inconsistent cluster view, shutdown, kernel panic
Message-ID: <Pine.LNX.4.64.0605251810370.24245@democritos.sissa.it>


Hello,

we are trying to install GFS (cluster-1.02 on vanilla 2.6.16.16) on a 
CentOS cluster of 70 "diskless" nodes.

The structure is something like this:

+---+   GNBD-SERVERS                  GNBD CLIENTS
|   |-----[node63]-----[node64 node65 node66 node67 node68 node69]
| S |.....
| A |.....
| N |-----[node07]-----[node08 node09 node10 node11 node12 node13]
|   |-----[node00]-----[node01 node02 node03 node04 node05 node06]
+---+

All the nodes have a gigabit NIC and all the nodes see each other.
Only the gnbd-servers have a fiber adapter to connect to the SAN.

Everything works fine as far as we test on 33 nodes: 9 nodes with the 
fiber adapter (acting as both GFS nodes and gnbd-servers) and 24 gnbd 
clients (connected to 4 of the gnbd-servers). "Fine" means that we have 
been able to mount and use the GFS filesystem.

When we try to start cman on 39 nodes (or worst, when we try with 63
nodes), more or less half of the nodes soon get this:

 	"kernel panic - not syncing: membership stopped responding"

We tried to increase CMAN_CLUSTER_TIMEOUT and CMAN_QUORUM_TIMEOUT 
(/etc/init.d/cman), but the problem persists.

We tried to boot the nodes 10 at once, with a 2 minutes delay between 
groups. As soon as we reach the quorum (or one of the timeout?) the nodes 
start collapsing due to "Inconsistent cluster view", "Shutdown", "No 
response to messages".

We also tried the patch supplied as solution for the bug report 187777, 
but nothing changes.

Is there a limit on the number of nodes, a timeout, or any other issue 
that we didn't consider?

Here you can find the cluster.conf, logs from survived and dead nodes, 
tcpdump for UDP:6809, nodes' /proc/cluster/{status,nodes,services}:

 	http://www.democritos.it/~baro/gfs-test/

There's a lot of stuff, let me know if you need something more specific.


RTFM's are welcome.

Thanks in advance

Ciao

Moreno



From Tomasz.Koczorowski at centertel.pl  Mon May 29 08:41:07 2006
From: Tomasz.Koczorowski at centertel.pl (Tomasz Koczorowski)
Date: Mon, 29 May 2006 10:41:07 +0200
Subject: [Linux-cluster] Fencing problem
Message-ID: <7E36A75EF0ABE243954B6E5AFA35D86645AD2E@EXCH-BK.centertel.main>

Hi,
 
I have a problem with RHCS 4 in two node configuration (wayne and
eastwood).
Service ucpgw is running on wayne and httpd on eastwood.
Every node is a Sun V40z server, fencing is done by IPMI.
During cluster tests I unplug both power cables from one server (wayne),
thus
simulating unexpected poweroff (IPMI interface is also unavailable while
server
is out of power). Unfortunately cluster (with only one node active) is
not able to
start the missing service. Node wayne is able to start both services
only after a
reboot.

Clustat info before tests:

Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  wayne                                    Online, rgmanager
  eastwood                                 Online, Local, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  ucpgw                wayne                          started
  httpd                eastwood                       started

Clustat info after tests (clustat waits about 10 seconds on a timeout
before
displaying this message):

Timed out waiting for a response from Resource Group Manager
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  wayne                                    Offline
  eastwood                                 Online, Local, rgmanager


 
Here are the logs from running node:

May 29 08:08:24 eastwood clurgmgrd: [3252]: <info> Executing
/etc/init.d/httpd status
May 29 08:08:37 eastwood kernel: CMAN: removing node wayne from the
cluster : Missed too many heartbeats
May 29 08:09:01 eastwood crond(pam_unix)[19600]: session opened for user
root by (uid=0)
May 29 08:09:01 eastwood crond(pam_unix)[19600]: session closed for user
root
May 29 08:10:01 eastwood crond(pam_unix)[19610]: session opened for user
root by (uid=0)
May 29 08:10:01 eastwood crond(pam_unix)[19610]: session closed for user
root
May 29 08:10:07 eastwood fenced[2854]: wayne not a cluster member after
90 sec post_fail_delay
May 29 08:10:07 eastwood fenced[2854]: fencing node "wayne"
May 29 08:11:01 eastwood crond(pam_unix)[19623]: session opened for user
root by (uid=0)
May 29 08:11:01 eastwood crond(pam_unix)[19623]: session closed for user
root
May 29 08:11:51 eastwood ntpd[2895]: can't open /var/ntp/ntp.drift.TEMP:
No such file or directory
May 29 08:12:01 eastwood crond(pam_unix)[19640]: session opened for user
root by (uid=0)
May 29 08:12:01 eastwood crond(pam_unix)[19640]: session closed for user
root
May 29 08:12:27 eastwood fenced[2854]: agent "fence_ipmilan" reports:
Rebooting machine @ IPMI:10.12.83.177...ipmilan: Failed to connect after
30 seconds Failed
May 29 08:12:27 eastwood fenced[2854]: fence "wayne" failed
May 29 08:12:32 eastwood fenced[2854]: fencing node "wayne"
May 29 08:12:32 eastwood ccsd[2751]: process_get: Invalid connection
descriptor received.
May 29 08:12:32 eastwood ccsd[2751]: Error while processing get: Invalid
request descriptor
May 29 08:12:32 eastwood fenced[2854]: fence "wayne" failed
May 29 08:12:37 eastwood fenced[2854]: fencing node "wayne"
May 29 08:12:37 eastwood ccsd[2751]: process_get: Invalid connection
descriptor received.
May 29 08:12:37 eastwood ccsd[2751]: Error while processing get: Invalid
request descriptor
May 29 08:12:37 eastwood fenced[2854]: fence "wayne" failed
May 29 08:12:42 eastwood fenced[2854]: fencing node "wayne"
May 29 08:12:42 eastwood ccsd[2751]: process_get: Invalid connection
descriptor received.
May 29 08:12:42 eastwood ccsd[2751]: Error while processing get: Invalid
request descriptor
May 29 08:12:42 eastwood fenced[2854]: fence "wayne" failed
May 29 08:12:47 eastwood fenced[2854]: fencing node "wayne"
May 29 08:12:47 eastwood ccsd[2751]: process_get: Invalid connection
descriptor received.
May 29 08:12:47 eastwood ccsd[2751]: Error while processing get: Invalid
request descriptor
May 29 08:12:47 eastwood fenced[2854]: fence "wayne" failed
May 29 08:12:52 eastwood fenced[2854]: fencing node "wayne"
May 29 08:12:52 eastwood ccsd[2751]: process_get: Invalid connection
descriptor received.
May 29 08:12:52 eastwood ccsd[2751]: Error while processing get: Invalid
request descriptor
May 29 08:12:52 eastwood fenced[2854]: fence "wayne" failed
May 29 08:12:57 eastwood fenced[2854]: fencing node "wayne"
May 29 08:12:57 eastwood ccsd[2751]: process_get: Invalid connection
descriptor received.
May 29 08:12:57 eastwood ccsd[2751]: Error while processing get: Invalid
request descriptor

Here's my cluster config:

<?xml version="1.0"?>
<cluster config_version="75" name="ucp_cluster">
        <fence_daemon post_fail_delay="90" post_join_delay="30"/>
        <clusternodes>
                <clusternode name="wayne" votes="1">
                        <multicast addr="224.0.0.1" interface="bond0"/>
                        <fence>
                                <method name="1">
                                        <device name="ipmi-wayne"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="eastwood" votes="1">
                        <multicast addr="224.0.0.1" interface="bond0"/>
                        <fence>
                                <method name="1">
                                        <device name="ipmi-eastwood"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="224.0.0.1"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="none"
ipaddr="10.12.83.176" login="&quot;&quot;" name="ipmi-eastwood"
passwd="**********"/>
                <fencedevice agent="fence_ipmilan" auth="none"
ipaddr="10.12.83.177" login="&quot;&quot;" name="ipmi-wayne"
passwd="**********"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="ucp" ordered="1"
restricted="1">
                                <failoverdomainnode name="wayne"
priority="1"/>
                                <failoverdomainnode name="eastwood"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.12.17.135" interface="bond0"
monitor_link="0"/>
                        <ip address="10.12.17.136" monitor_link="1"/>
                        <fs device="/dev/ucpsmslogvg/ucpsmslog"
force_fsck="0" force_unmount="1" fsid="38769" fstype="ext3"
mountpoint="/ucpsmslog" name="ucpsmslog" options="" self_fence="1"/>
                        <fs device="/dev/ucpgwlogvg/ucpgwlog"
force_fsck="0" force_unmount="1" fsid="39307" fstype="ext3"
mountpoint="/ucpgwlog" name="ucpgwlog" options="" self_fence="1"/>
                        <script file="/home/ucpgw/bin/ucpgw"
name="ucpgw"/>
                        <script file="/etc/init.d/httpd" name="httpd"/>
                </resources>
                <service autostart="1" domain="ucp" name="ucpgw"
recovery="relocate">
                        <script ref="ucpgw"/>
                        <fs ref="ucpgwlog"/>
                        <ip ref="10.12.17.136"/>
                </service>
                <service autostart="1" domain="ucp" name="httpd"
recovery="relocate">
                        <ip ref="10.12.17.135"/>
                        <script ref="httpd"/>
                        <fs ref="ucpsmslog"/>
                </service>
        </rm>
</cluster>

Is this cluster misconfigured or is it a bug in fenced/ccsd subsystem?
How can I solve this problem?

Regards,

Tomasz Koczorowski




From eric at bootseg.com  Mon May 29 13:18:24 2006
From: eric at bootseg.com (Eric Kerin)
Date: Mon, 29 May 2006 09:18:24 -0400
Subject: [Linux-cluster] Fencing problem
In-Reply-To: <7E36A75EF0ABE243954B6E5AFA35D86645AD2E@EXCH-BK.centertel.main>
References: <7E36A75EF0ABE243954B6E5AFA35D86645AD2E@EXCH-BK.centertel.main>
Message-ID: <1148908704.2630.7.camel@auh5-0479.corp.jabil.org>

On Mon, 2006-05-29 at 10:41 +0200, Tomasz Koczorowski wrote:
> Hi,
>  
> I have a problem with RHCS 4 in two node configuration (wayne and
> eastwood).
> Service ucpgw is running on wayne and httpd on eastwood.
> Every node is a Sun V40z server, fencing is done by IPMI.
> During cluster tests I unplug both power cables from one server (wayne),
> thus
> simulating unexpected poweroff (IPMI interface is also unavailable while
> server
> is out of power). 
> <SNIP>
> Is this cluster misconfigured or is it a bug in fenced/ccsd subsystem?
> How can I solve this problem?
> 

This is an inherent flaw in using the on-board control devices (ILO,
IPMI, etc) as fence devices.  Since the remaining node(s) can't
successfully fence the failed node, they won't continue.

Fenced also can't assume the machine is already powered down, since it
could be a network problem keeping it from accessing the other node (and
it's IPMI device)

I use two network accessible power controllers for fencing my cluster.
With each power supply hooked up to a different controller, providing
redundant power paths.

Thanks, 
Eric Kerin
eric at bootseg.com





From admin.cluster at gmail.com  Mon May 29 14:41:49 2006
From: admin.cluster at gmail.com (Anthony)
Date: Mon, 29 May 2006 16:41:49 +0200
Subject: [Linux-cluster] /etc/pam.d/su with ldap
Message-ID: <447B082D.1010803@gmail.com>

Hi, i have modified my /etc/su file so that it integrates Ldap,

i got a small problem, whenever the root user do a
$su - username1
password:

it asks me a passwor, i hit 'enter' then i get the prompt;

what is wrong with my su confg file ?


# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 2)

# uname -a
Linux cluster2 2.6.9-22.0.1.ELsmp #1 SMP Tue Oct 18 18:39:02 EDT 2005
x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/pam.d/su
#%PAM-1.0
#pam_ldap Added by me
auth       sufficient   pam_ldap.so
account    sufficient   pam_ldap.so
password   sufficient   pam_ldap.so

auth       sufficient   /lib/security/$ISA/pam_rootok.so
# Uncomment the following line to implicitly trust users in the "wheel"
group.
#auth       sufficient   /lib/security/$ISA/pam_wheel.so trust use_uid
# Uncomment the following line to require a user to be in the "wheel" group.
#auth       required     /lib/security/$ISA/pam_wheel.so use_uid
auth       required     /lib/security/$ISA/pam_stack.so service=system-auth
account    required     /lib/security/$ISA/pam_stack.so service=system-auth
#added try_first_pass
password   required     /lib/security/$ISA/pam_stack.so
service=system-auth try_first_pass
# pam_selinux.so close must be first session rule
session    required     /lib/security/$ISA/pam_selinux.so close
session    required     /lib/security/$ISA/pam_stack.so service=system-auth
# pam_selinux.so open and pam_xauth must be last two session rules
session    required     /lib/security/$ISA/pam_selinux.so open multiple
session    optional     /lib/security/$ISA/pam_xauth.so




From pcaulfie at redhat.com  Tue May 30 06:57:48 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 30 May 2006 07:57:48 +0100
Subject: [Linux-cluster] Inconsistent cluster view, shutdown, kernel panic
In-Reply-To: <Pine.LNX.4.64.0605251810370.24245@democritos.sissa.it>
References: <Pine.LNX.4.64.0605251810370.24245@democritos.sissa.it>
Message-ID: <447BECEC.1060406@redhat.com>

Moreno Baricevic wrote:
> 
> Hello,
> 
> we are trying to install GFS (cluster-1.02 on vanilla 2.6.16.16) on a
> CentOS cluster of 70 "diskless" nodes.
> 
> The structure is something like this:
> 
> +---+   GNBD-SERVERS                  GNBD CLIENTS
> |   |-----[node63]-----[node64 node65 node66 node67 node68 node69]
> | S |.....
> | A |.....
> | N |-----[node07]-----[node08 node09 node10 node11 node12 node13]
> |   |-----[node00]-----[node01 node02 node03 node04 node05 node06]
> +---+
> 
> All the nodes have a gigabit NIC and all the nodes see each other.
> Only the gnbd-servers have a fiber adapter to connect to the SAN.
> 
> Everything works fine as far as we test on 33 nodes: 9 nodes with the
> fiber adapter (acting as both GFS nodes and gnbd-servers) and 24 gnbd
> clients (connected to 4 of the gnbd-servers). "Fine" means that we have
> been able to mount and use the GFS filesystem.
> 
> When we try to start cman on 39 nodes (or worst, when we try with 63
> nodes), more or less half of the nodes soon get this:
> 
>     "kernel panic - not syncing: membership stopped responding"
> 
> We tried to increase CMAN_CLUSTER_TIMEOUT and CMAN_QUORUM_TIMEOUT
> (/etc/init.d/cman), but the problem persists.
> 
> We tried to boot the nodes 10 at once, with a 2 minutes delay between
> groups. As soon as we reach the quorum (or one of the timeout?) the
> nodes start collapsing due to "Inconsistent cluster view", "Shutdown",
> "No response to messages".
> 
> We also tried the patch supplied as solution for the bug report 187777,
> but nothing changes.
> 
> Is there a limit on the number of nodes, a timeout, or any other issue
> that we didn't consider?


To be honest, cman has never been tested beyond 32 nodes to my knowledge. for
large clusters you may well be better off using gulm - at least in the short-term.

> Here you can find the cluster.conf, logs from survived and dead nodes,
> tcpdump for UDP:6809, nodes' /proc/cluster/{status,nodes,services}:
> 
>     http://www.democritos.it/~baro/gfs-test/
> 
> There's a lot of stuff, let me know if you need something more specific.

I'll have a look through those logs...but it may take me some time !

Thanks,

Patrick

-- 

patrick



From Olivier.Thibault at lmpt.univ-tours.fr  Tue May 30 09:11:03 2006
From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault)
Date: Tue, 30 May 2006 11:11:03 +0200
Subject: [Linux-cluster] gfs export over nfs is very slow
Message-ID: <447C0C27.5090906@lmpt.univ-tours.fr>

Hi,


I am testing RHCS on Fedora Core 5.
I have a shared gfs volume mounted on two nodes (using clvmd and lock_dlm).
Locally, everything is ok.
If I export the gfs volume via nfs, i obtain *very poor* performance.
For exemple, from a nfs client with dd, it take 90 seconds to create a 
16 MB file !!!
 From the cluster's nodes, the performances a good, and i made some 
tests exporting xfs over nfs, and it was good too.
So what's wrong with nfs+gfs ?
I would be very interested to know how guys who use this have configured 
it, and what performances they have.

Thanks for any advices.

Best regards

-- 
Olivier THIBAULT
Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083)
Universit? Fran?ois Rabelais
Parc de Grandmont - 37200 TOURS
T?l: +33 2 47 36 69 12
Fax: +33 2 47 36 69 56



From rajkum2002 at rediffmail.com  Tue May 30 14:59:18 2006
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 30 May 2006 14:59:18 -0000
Subject: [Linux-cluster] gfs export over nfs is very slow
Message-ID: <20060530145918.9049.qmail@webmail46.rediffmail.com>

Hi,

We are using GFS6.0 (no cluster suite) and NFS exports of the file system. I am getting a transfer rate of about 35MB/sec. We have a high speed SAN. Actually the transfer rate can be little higher but we attribute the slow rate to NFS itself since we see the same numbers for EXT3 also. 

Regards,
Raj


On Tue, 30 May 2006 Olivier Thibault wrote :
>Hi,
>
>
>I am testing RHCS on Fedora Core 5.
>I have a shared gfs volume mounted on two nodes (using clvmd and lock_dlm).
>Locally, everything is ok.
>If I export the gfs volume via nfs, i obtain *very poor* performance.
>For exemple, from a nfs client with dd, it take 90 seconds to create a 16 MB file !!!
> From the cluster's nodes, the performances a good, and i made some tests exporting xfs over nfs, and it was good too.
>So what's wrong with nfs+gfs ?
>I would be very interested to know how guys who use this have configured it, and what performances they have.
>
>Thanks for any advices.
>
>Best regards
>
>-- Olivier THIBAULT
>Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083)
>Universit? Fran?ois Rabelais
>Parc de Grandmont - 37200 TOURS
>T?l: +33 2 47 36 69 12
>Fax: +33 2 47 36 69 56
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060530/6c13dc02/attachment.htm>

From c_triantafillou at hotmail.com  Tue May 30 15:10:27 2006
From: c_triantafillou at hotmail.com (Christos Triantafillou)
Date: Tue, 30 May 2006 17:10:27 +0200
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
Message-ID: <BAY112-F14E54CC60FBA3C9C51EB3093920@phx.gbl>

Hello,

I am trying to install DLM (the Distributed Lock Manager) on RedHat 
Enterprise Linux, the 64-bit x86_64 version.

I believe that all the required modules and RPMs are installed but the test 
programs from the RedHat cluster sources give the same error:
dlm error 2, No such file or directory
I get this from user tests like: asttest, dlmtest, lstest, pingtest, as well 
as from my own tests.

How can I confirm the proper installation of DLM on this platform and get 
these tests to run?

Regards,
Chris




From Olivier.Thibault at lmpt.univ-tours.fr  Tue May 30 15:22:36 2006
From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault)
Date: Tue, 30 May 2006 17:22:36 +0200
Subject: [Linux-cluster] gfs export over nfs is very slow
In-Reply-To: <20060530145918.9049.qmail@webmail46.rediffmail.com>
References: <20060530145918.9049.qmail@webmail46.rediffmail.com>
Message-ID: <447C633C.1090508@lmpt.univ-tours.fr>

Hi,

Raj Kumar a ?crit :
 > Hi,
 >
 > We are using GFS6.0 (no cluster suite) and NFS exports of the file 
system. I am getting a transfer rate of about 35MB/sec. We have a high 
speed SAN. Actually the transfer rate can be little higher but we 
attribute the slow rate to NFS itself since we see the same numbers for 
EXT3 also.
 >
 > Regards,
 > Raj
 >
 >

Thank you for your answer.
I am upgrading to last GFS/DLM/CMAN kernel stuff and will retry.
I've ran bonnie++ with ext3 exported over nfs and it is really speeder 
even if it's not what i expected. I got about 22 MB/s (r/w).
But i saw that nfsd was consuming a lot of CPU. The system load was 15 !!
I've also ran test with Suse SLES9 xfs exported over nfs. I got 40MB/s, 
which is what aim to get with GFS ...
I don't understand ...

Is there anybody who export gfs over nfs with FC5 ?

Thanks by advance

Olivier

> On Tue, 30 May 2006 Olivier Thibault wrote :
>> Hi,
>>
>>
>> I am testing RHCS on Fedora Core 5.
>> I have a shared gfs volume mounted on two nodes (using clvmd and lock_dlm).
>> Locally, everything is ok.
>> If I export the gfs volume via nfs, i obtain *very poor* performance.
>> For exemple, from a nfs client with dd, it take 90 seconds to create a 16 MB file !!!
>> From the cluster's nodes, the performances a good, and i made some tests exporting xfs over nfs, and it was good too.
>> So what's wrong with nfs+gfs ?
>> I would be very interested to know how guys who use this have configured it, and what performances they have.
>>
>> Thanks for any advices.
>>
>> Best regards
>>
>> -- Olivier THIBAULT
>> Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083)
>> Universit? Fran?ois Rabelais
>> Parc de Grandmont - 37200 TOURS
>> T?l: +33 2 47 36 69 12
>> Fax: +33 2 47 36 69 56
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Olivier THIBAULT
Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083)
Universit? Fran?ois Rabelais
Parc de Grandmont - 37200 TOURS
T?l: +33 2 47 36 69 12
Fax: +33 2 47 36 69 56



From filipe.miranda at gmail.com  Tue May 30 15:22:34 2006
From: filipe.miranda at gmail.com (Filipe Miranda)
Date: Tue, 30 May 2006 12:22:34 -0300
Subject: [Linux-cluster] Oracle RAC and Cluster Suite,
	can they coexist on the same machine?
Message-ID: <a6d13c780605300822qa6c17b8n899f82f925541ef5@mail.gmail.com>

Hello there,

I would like to know if there is any problem when using the Oracle Real
Application Clusters to mange the Oracle database and Red Hat Cluster Suite
to manage an application on the same machine?

Remember that the application will not be entirely managed by the RHCS, it
will be ative on all nodes and the RHCS will only be used to manage virtual
IP reallocation and mounting points.

Is there any problems reported so far with this configuration?

Does Oracle RAC and RHCS can coexist without problems?

Any comments on this type of configuration?

-- 
Att.
---
Filipe T Miranda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060530/20a98e5a/attachment.htm>

From c_triantafillou at hotmail.com  Tue May 30 15:36:16 2006
From: c_triantafillou at hotmail.com (Christos Triantafillou)
Date: Tue, 30 May 2006 17:36:16 +0200
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <BAY112-F14E54CC60FBA3C9C51EB3093920@phx.gbl>
Message-ID: <BAY112-F23A1F35A19464B31856E1693920@phx.gbl>


Further to my last message, I found out that a few devices are missing:
/dev/misc/dlm_default
/dev/misc/dlm-control
How can I create those devices?

Regards,
Christos




From pcaulfie at redhat.com  Tue May 30 15:46:55 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 30 May 2006 16:46:55 +0100
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <BAY112-F14E54CC60FBA3C9C51EB3093920@phx.gbl>
References: <BAY112-F14E54CC60FBA3C9C51EB3093920@phx.gbl>
Message-ID: <447C68EF.60408@redhat.com>

Christos Triantafillou wrote:
> Hello,
> 
> I am trying to install DLM (the Distributed Lock Manager) on RedHat
> Enterprise Linux, the 64-bit x86_64 version.
> 
> I believe that all the required modules and RPMs are installed but the
> test programs from the RedHat cluster sources give the same error:
> dlm error 2, No such file or directory
> I get this from user tests like: asttest, dlmtest, lstest, pingtest, as
> well as from my own tests.
> 
> How can I confirm the proper installation of DLM on this platform and
> get these tests to run?


First, make sure the DLM module is loaded. Then look in /proc/misc to see if
the dlm device has been registered (if the module is loaded then it should be)
and that the protection on the dlm-control and/or dlm_<lockspace> device files
is appropriate.

Also, are you trying to run the tests as non-root? Only root can create a
lockspace so you will need to create the lockspace first (as root) then set
the permissions on the lockspace device for other users to access it.

The default lockspace (as used by dlmtest & asttest) will be created
dynamically, but there is a bug in the current version that prevents non-root
users from creating it.

-- 

patrick



From theo at tkd.co.id  Wed May 31 09:54:19 2006
From: theo at tkd.co.id (Theodorus)
Date: Wed, 31 May 2006 16:54:19 +0700
Subject: [Linux-cluster] sid for pid failed
In-Reply-To: <447C68EF.60408@redhat.com>
Message-ID: <TKDNETdsDb7xBNWep4C00000391@tkdnet.tkd.co.id>

I had a problem with some service on my linux system...
I'm using RHAS-3 with cluster manager

I got the error messages like this :


May 31 09:15:37 node2 pidof[28294]: can't read sid for pid 28303 
May 31 09:21:33 node2 pidof[4086]: can't read sid for pid 4080 
May 31 09:22:12 node2 pidof[4838]: can't read sid for pid 4843 
May 31 09:22:51 node2 pidof[5824]: can't read sid for pid 5834 
May 31 09:25:43 node2 pidof[9682]: can't read sid for pid 9697 
May 31 09:36:18 node2 clusvcmgrd[24107]: <warning> Restarting locally failed
service radbill 
May 31 09:36:18 node2 clusvcmgrd[24132]: <warning> Restarting locally failed
service ama_d 
May 31 09:36:19 node2 clusvcmgrd: [24289]: <notice> service notice: Stopping
service ama_d ... 
May 31 09:36:19 node2 clusvcmgrd: [24199]: <notice> service notice: Stopping
service radbill ... 
May 31 09:36:19 node2 clusvcmgrd: [24289]: <notice> service notice: Running
user script '/project/services/ama_start.sh stop' 
May 31 09:36:19 node2 clusvcmgrd: [24199]: <notice> service notice: Running
user script '/project/services/radbill_start.sh stop' 
May 31 09:36:19 node2 ama_start.sh: AmaMeterTelnet shutdown succeeded
May 31 09:36:19 node2 radbill_start.sh: RadbillTelnet shutdown succeeded
May 31 10:36:22 node2 pidof[15488]: can't read sid for pid 15496 
May 31 10:36:22 node2 pidof[15499]: can't read sid for pid 15502 
May 31 10:37:01 node2 pidof[16450]: can't read sid for pid 16458 
May 31 10:42:25 node2 pidof[23917]: can't read sid for pid 23913 
May 31 10:43:45 node2 pidof[25890]: can't read sid for pid 25896 
May 31 10:45:45 node2 sshd(pam_unix)[28479]: session opened for user root by
(uid=0)
May 31 10:46:29 node2 pidof[29534]: can't read sid for pid 29537 
May 31 10:48:58 node2 pidof[395]: can't read sid for pid 399


Those is just a piece from the message from linux log.
Can anyone tell me what is wrong or what is causing this problem...


Thnx in advance

theo



From c_triantafillou at hotmail.com  Wed May 31 11:35:44 2006
From: c_triantafillou at hotmail.com (Christos Triantafillou)
Date: Wed, 31 May 2006 13:35:44 +0200
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <447C68EF.60408@redhat.com>
Message-ID: <BAY112-F131FFCABF18E95D066681E93930@phx.gbl>

Hello Patrick,

Thanks a lot for your reply.

I have created the devices like this:
>la /dev/misc
total 0
drwxrwxrwx   2 root root     80 May 30 21:32 .
drwxr-xr-x  11 root root   5820 May 30 21:33 ..
crwxrwxrwx   1 root root 10, 62 May 30 21:32 dlm-control
crwxrwxrwx   1 root root 10, 61 May 30 21:32 dlm_default

Now I can run dlmtest as root but as non-root I get this:
>dlmtest abcd
locking abcd EX ...
lock: Operation not permitted

Is it possible to use DLM as non-root?

Strace shows this at the end:
write(2, "locking abcd EX ...", 19)     = 19
brk(0)                                  = 0x502000
brk(0x523000)                           = 0x523000
open("/dev/misc/dlm_default", O_RDWR)   = -1 ENODEV (No such device)
stat("/dev/misc/dlm-control", {st_mode=S_IFCHR|0766, st_rdev=makedev(10, 
62), ...}) = 0
open("/proc/misc", O_RDONLY)            = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x2a9565b000
read(3, " 62 dlm-control\n183 hw_random\n 6"..., 1024) = 67
close(3)                                = 0
munmap(0x2a9565b000, 4096)              = 0
open("/dev/misc/dlm-control", O_RDWR)   = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
write(3, "\3\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\1\0\0\0\177\0\0\0defa"..., 143) = 
-1 EPERM (Operation not permitted)
write(2, "\n", 1)                       = 1
write(2, "lock: Operation not permitted\n", 30) = 30
exit_group(-1)                          = ?

Regards,
Christos




From pcaulfie at redhat.com  Wed May 31 14:02:15 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 31 May 2006 15:02:15 +0100
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <BAY112-F131FFCABF18E95D066681E93930@phx.gbl>
References: <BAY112-F131FFCABF18E95D066681E93930@phx.gbl>
Message-ID: <447DA1E7.2010505@redhat.com>

Christos Triantafillou wrote:
> Hello Patrick,
> 
> Thanks a lot for your reply.
> 
> I have created the devices like this:
>> la /dev/misc
> total 0
> drwxrwxrwx   2 root root     80 May 30 21:32 .
> drwxr-xr-x  11 root root   5820 May 30 21:33 ..
> crwxrwxrwx   1 root root 10, 62 May 30 21:32 dlm-control
> crwxrwxrwx   1 root root 10, 61 May 30 21:32 dlm_default
> 
> Now I can run dlmtest as root but as non-root I get this:
>> dlmtest abcd
> locking abcd EX ...
> lock: Operation not permitted
> 
> Is it possible to use DLM as non-root?
>

Yes, if you create the lockspace as root beforehand.

so: root needs to run a program (lstest would do) to create the lockspace.
Then you change the permissions on the dlm_<lockspace> device, either in the
same script or using udev config files.

You can't currently do this for the default lockspace, because it is removed
when it is no longer in use. That bug will be fixed in a future update.

-- 

patrick



From c_triantafillou at hotmail.com  Wed May 31 15:35:56 2006
From: c_triantafillou at hotmail.com (Christos Triantafillou)
Date: Wed, 31 May 2006 17:35:56 +0200
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <447DA1E7.2010505@redhat.com>
Message-ID: <BAY112-F34F0185F28129599A99A1A93930@phx.gbl>

Patrick,

I have followed your instructions and created a non-default lockspace.
Now all the user tests seem to work even if they do not explicitly open the 
new lockspace!

How do you explain that?

Regards,
Christos




From pcaulfie at redhat.com  Wed May 31 17:00:55 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 31 May 2006 18:00:55 +0100
Subject: [Linux-cluster] DLM & RedHat Enterprise Linux
In-Reply-To: <BAY112-F34F0185F28129599A99A1A93930@phx.gbl>
References: <BAY112-F34F0185F28129599A99A1A93930@phx.gbl>
Message-ID: <447DCBC7.9020803@redhat.com>

Christos Triantafillou wrote:
> Patrick,
> 
> I have followed your instructions and created a non-default lockspace.
> Now all the user tests seem to work even if they do not explicitly open
> the new lockspace!
> 
> How do you explain that?

erm, I can't. You mean that the users are using the default lockspace even
though the lockspace that was created by root was a different one? Strange.

If you give me a more exact description of what's happening I should be able
to untangle it.

-- 

patrick



From joe.warren-meeks at aggregator.tv  Wed May 31 16:38:37 2006
From: joe.warren-meeks at aggregator.tv (Joe Warren-Meeks)
Date: Wed, 31 May 2006 17:38:37 +0100
Subject: [Linux-cluster] Shared Filesystem
Message-ID: <1D3D791D-11A4-44EB-8167-CC88D11D6F4B@aggregator.tv>


Hey there,

I've got an iscsi array, on which I have a load of content to be  
served via http. I basically want to have a bunch of linux boxes  
mount the same partition read-only, via iscsi, so that they can do this.

Now, I know I need GFS and one of the locking daemons (possibly  
gulm?), but do I need anything else from the cluster suite?

Anyone got any pointers on where to look for this info? I don't want  
to set up a full-blown cluster, just share one partition between  
multiple machines.

Cheers!

  -- joe.

Joe Warren-Meeks                       T: +44 (0) 208 962 0007
Aggregator Limited                     M: +44 (0) 7789 176078
Unit 62/63 Pall Mall Deposit           E: joe.warren-meeks at aggregator.tv
124-128 Barlby Road, London W10 6BL
PGP Fingerprint: 361F 78D0 56F5 8D7F 2639  947D 71E2 8811 F825 64CC