From ciril at hcl.in  Wed Feb  1 03:57:06 2006
From: ciril at hcl.in (Ciril Ignatious T)
Date: Wed, 01 Feb 2006 09:27:06 +0530
Subject: [Linux-cluster] Linux Cluster
In-Reply-To: <20060131124751.67730.qmail@web52307.mail.yahoo.com>
References: <20060131124751.67730.qmail@web52307.mail.yahoo.com>
Message-ID: <43E03192.6090204@hcl.in>

Dear Suvankar

Oracle RAC  itself is a failover and  loadbalancing cluster.So there is 
no need for a cluster suite for the HA functionality.
You can directly install RHEL 3.0/4.0 in both machines and configure 10 
RAC on top of OS.
There is a very good documentation in the 10g RAC CD on how to set up 
the cluster.

Also what do you mean by OS level cluster?

Regards

Ciril
 
SUVANKAR MOITRA wrote:
> dear rajesh,
>
> Then what about Oracle 10g rac? How oracle 10g rac to
> be install ? I am using rhel as4 , redhat cluster
> suite4 and oracle 10g rac. I want to build a   cluster
> in os level as well as database level.
>
> I am using follwing Hardware:----
>
> 1> Hp dl380g4 x 2nos
> 2> Hp Msa500 storage x 1 no
>
> what are the steps for the follwing installation.Pl
> send  me a proper document for the above.
>
>
>
> regards
>
> Suvankar Moitra
>
>
> --- Rajesh singh <singh.rajeshwar at gmail.com> wrote:
>
>   
>> Dear Suvankar,
>> Its not quite clear as to what is your requirement.
>> do you have a copy of RHE 3/4 and a copy of cluster
>> suite
>> if yes than install OS on either server, connect
>> MSA500, install cluster
>> suite on both machine and configure cluster.
>> regards
>>
>> On 1/10/06, Suvankar Moitra
>> <suvankar_moitra at pcstech.com> wrote:
>>     
>>> Hi,
>>>
>>>
>>> I'm looking for an installing and configuring
>>>       
>> procedure for RedHat Cluster
>>     
>>> suite .
>>> We have 2 servers HP Proliant DL380 G4 (RedHat
>>>       
>> Advanced Server 4) attached
>>     
>>> by  HP MSA500 storage and we want to install and
>>>       
>> configure Cluster suite .
>>     
>>> Thanks.
>>>
>>>
>>> Suvankar
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>>
>>>       
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   
>>> --
>>>       
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>>
>>     
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> DISCLAIMER:
> --------------------------------------------------------------------------------------------------------------
>
> This e-mail contains confidential and/or privileged information. If you are not the intended recipient
> (or have received this e-mail in error)please notify the sender immediately and destroy this e-mail.
> Any unauthorized copying, disclosure, use or distribution of the material in this e-mail is strictly forbidden.
> ---------------------------------------------------------------------------------------------------------------
>
>   


-- 
CIRIL IGNATIOUS T
R & D ENGINEER
HCL INFOSYSTEMS LTD
PONDICHERRY
PH:09894027005

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060201/3b7615f0/attachment.htm>

From suvankar_moitra at yahoo.com  Wed Feb  1 06:04:21 2006
From: suvankar_moitra at yahoo.com (SUVANKAR MOITRA)
Date: Tue, 31 Jan 2006 22:04:21 -0800 (PST)
Subject: [Linux-cluster] using GNBD with other cluster filesystem
In-Reply-To: <4c8edc920601302040n55d02bf3lceeb16b1528ead2a@mail.gmail.com>
Message-ID: <20060201060421.79679.qmail@web52306.mail.yahoo.com>

dear gourav,

Today i am installing the above if iam sucessful i
will send the doc to u.

regards

Suvankar
--- gaurav <linux4all at gmail.com> wrote:

> I had not exlored about using iSCSI with Oracle RAC.
> i am now looking into that more seriously.(pl. pass
> on any links if u have )
> 
> Yes, since this is only for testing purposes,so i do
> not intend to use
> multiple GNBD servers so i would not require to use
> the -c (caching)
> option
> 
> regards
> 
> gaurav
> 
> 
> >hi
> >If you use gnbd to export storage, the gnbd clients
> should view this as a
> >regular shared block device. You are definitely
> able to put other filesystems
> >on top of GNBD... with some caveats. GNBD is tied
> pretty closely with the
> >RHCS cluster manager. The only way you can export
> devices uncached is with
> >a cluster set up.  If you don't need to export
> uncached devices, you can use
> >the -n (no cluster) option to avoid setting up a
> RHCS cluster.
> 
> >When devices are exported in cached mode, reads
> will use the buffer cache on
> >the GNBD server. This is a problem if you want to
> use the exported block device
> >directly on the server, or want to export the same
> shared storage device from
> >multiple gnbd servers. If you aren't doing either
> of those, you should be able
> >to use the -n option.
> 
> >-Ben
> 
> >Another alternative would be to use iscsi, with a
> software iscsi target.
> 
> 
> --
> my site:http://www.gnulinuxclub.org
> my blog:http://linux4all.blogspot.com
> my project:http://masand.sourceforge.net
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From libregeek at gmail.com  Wed Feb  1 08:09:11 2006
From: libregeek at gmail.com (Manilal K M)
Date: Wed, 1 Feb 2006 13:39:11 +0530
Subject: [Linux-cluster] using GNBD with other cluster filesystem
In-Reply-To: <20060201060421.79679.qmail@web52306.mail.yahoo.com>
References: <4c8edc920601302040n55d02bf3lceeb16b1528ead2a@mail.gmail.com>
	<20060201060421.79679.qmail@web52306.mail.yahoo.com>
Message-ID: <2315046d0602010009s24ba1613n@mail.gmail.com>

On 01/02/06, SUVANKAR MOITRA <suvankar_moitra at yahoo.com> wrote:
> dear gourav,
>
> Today i am installing the above if iam sucessful i
> will send the doc to u.
>
Remember to send it to the mailing list



From brodriguezb at fujitsu.es  Wed Feb  1 11:38:42 2006
From: brodriguezb at fujitsu.es (=?UTF-8?B?QmFydG9sb23DqSBSb2Ryw61ndWV6?=)
Date: Wed, 01 Feb 2006 12:38:42 +0100
Subject: [Linux-cluster] journal size gfs.
Message-ID: <43E09DC2.6010306@fujitsu.es>

    Hello, i dont remember the journals size of my gfs partitions, how 
can i get it? id like add journal with the same size. I have used 
gfs_tool jindex but i dont see anything.

    Thanks and regards.
    Bart.

-- 
________________________________________

Bartolom? Rodr?guez Bordallo
Departamento de Explotaci?n de Servicios

FUJITSU ESPA?A SERVICES, S.A.U.
Camino Cerro de los Gamos, 1
28224 Pozuelo de Alarc?n, Madrid
Tel.: 902 11 40 10
Mail: brodriguezb at fujitsu.es

________________________________________


La informaci?n contenida en este e-mail es confidencial y va dirigida ?nicamente al receptor que aparece como destinatario. Si ha recibido este e-mail por error, por favor, notif?quenoslo inmediatamente y b?rrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use para ning?n prop?sito, ni revele sus contenidos a ninguna persona ni lo almacene ni copie esta informaci?n en ning?n medio.



From depeecmr at yahoo.com  Wed Feb  1 11:43:21 2006
From: depeecmr at yahoo.com (Daniel EPEE LEA)
Date: Wed, 1 Feb 2006 03:43:21 -0800 (PST)
Subject: [Linux-cluster] GFS over WAN link implementation
In-Reply-To: <43E09DC2.6010306@fujitsu.es>
Message-ID: <20060201114321.10388.qmail@web30207.mail.mud.yahoo.com>

Everyone,

I have this setup (please check attached picture).

WAN  link has dedicated 2048 Kbps (with average 14
miliseconds between nodes )
LB 1, 2, 3  Servers are Load Balancers
Node 1, 2, 3, 4 are Redhat Enterprise linux ES v4 +
GFS
MSA1000 storage has 30 Gig of Data right now, and can
grow beyond 100Gigs.

Can someone help me out to figure the best GFS
configuration to have:
1- Mirrored storage over the Wan Link ?
2- What options can be setup for asynchroneous
replication of the storrage if  the WAN link is
becomes too thin for synchronous replication ?
3- Nodes 3 & 4 fail over Nodes 1 & 2, supposing that
Node 1 and 2 are unavailable ?
4- Does GFS over DRBD work at this time ? and have
someone implement it ?

Thanks for your help,

Dan


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From gforte at leopard.us.udel.edu  Wed Feb  1 14:49:40 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Wed, 01 Feb 2006 09:49:40 -0500
Subject: [Linux-cluster] verbose clurgmgrd output?
In-Reply-To: <1138744624.4371.370.camel@ayanami.boston.redhat.com>
References: <43DE9133.6000409@leopard.us.udel.edu>
	<1138744624.4371.370.camel@ayanami.boston.redhat.com>
Message-ID: <43E0CA84.9020806@leopard.us.udel.edu>

Lon Hohberger wrote:
> On Mon, 2006-01-30 at 17:20 -0500, Greg Forte wrote:
>> Is there any way to get clurgmgrd to output more verbosely, esp. 
>> whatever messages there might have been from scripts it's managing? 
>> I've got several cluster services that don't behave properly, in one 
>> case if I run the script manually (i.e. sudo /etc/init.d/scriptname), it 
>> works fine, but when clurgmgrd trys it says it "returned 1 (generic 
>> error)".  Another appears to run perfectly fine (no error output from 
>> clurgmgrd), but as soon as it's up the manager stops it again, then 
>> repeats the cycle.  Again, running it manually works fine.  These are 
>> homespun scripts, so it's quite possible I'm missing something basic, 
>> but I can't figure out what (obviously ;-)
> 
> Hi Greg,
> 
> Try adding log_level="7" to the <rm> tag if you're using current
> STABLE/RHEL4 branches (don't forget to update the configuration
> version).  Note that older versions of the resource scripts don't have
> much in the way of logging.

Hmmm ... tried this, but no apparent change.  This would log into 
/var/log/messages, correct?  I am running the latest RHEL4 packages.

> Also, you can run "clurgmgrd -d", but that requires a restart.

same result.

> As an alternative -- you can run fun tests on the service manually:
> 
> clusvcadm -d <servicename>  (service must be disabled!)
> 
> rg_test test /etc/cluster/cluster.conf start service <servicename>
> rg_test test /etc/cluster/cluster.conf status service <servicename>
> rg_test test /etc/cluster/cluster.conf stop service <servicename>

rg_test segfaults immediately, no matter what I try.  ;-)

> (p.s. Say 'Hi' to mikeyp and monogoose for me if you see them. ^.^)

I know mikeyp, but no clue who monogoose is ...

-g

Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE



From gforte at leopard.us.udel.edu  Wed Feb  1 14:57:05 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Wed, 01 Feb 2006 09:57:05 -0500
Subject: [Linux-cluster] verbose clurgmgrd output?
In-Reply-To: <43E0CA84.9020806@leopard.us.udel.edu>
References: <43DE9133.6000409@leopard.us.udel.edu>	<1138744624.4371.370.camel@ayanami.boston.redhat.com>
	<43E0CA84.9020806@leopard.us.udel.edu>
Message-ID: <43E0CC41.3090502@leopard.us.udel.edu>

Never mind, I'm an idiot - none of my scripts support the "status" 
command, so of course the resource manager is going to fail them ... ugh.

In my defense, though, I haven't found any documentation describing 
minimum requirements for a CS-compatible init script.  Does any exist?

Still be interesting to know what's up with rg_test, too ...

-g

Greg Forte wrote:
> Lon Hohberger wrote:
>> On Mon, 2006-01-30 at 17:20 -0500, Greg Forte wrote:
>>> Is there any way to get clurgmgrd to output more verbosely, esp. 
>>> whatever messages there might have been from scripts it's managing? 
>>> I've got several cluster services that don't behave properly, in one 
>>> case if I run the script manually (i.e. sudo /etc/init.d/scriptname), 
>>> it works fine, but when clurgmgrd trys it says it "returned 1 
>>> (generic error)".  Another appears to run perfectly fine (no error 
>>> output from clurgmgrd), but as soon as it's up the manager stops it 
>>> again, then repeats the cycle.  Again, running it manually works 
>>> fine.  These are homespun scripts, so it's quite possible I'm missing 
>>> something basic, but I can't figure out what (obviously ;-)
>>
>> Hi Greg,
>>
>> Try adding log_level="7" to the <rm> tag if you're using current
>> STABLE/RHEL4 branches (don't forget to update the configuration
>> version).  Note that older versions of the resource scripts don't have
>> much in the way of logging.
> 
> Hmmm ... tried this, but no apparent change.  This would log into 
> /var/log/messages, correct?  I am running the latest RHEL4 packages.
> 
>> Also, you can run "clurgmgrd -d", but that requires a restart.
> 
> same result.
> 
>> As an alternative -- you can run fun tests on the service manually:
>>
>> clusvcadm -d <servicename>  (service must be disabled!)
>>
>> rg_test test /etc/cluster/cluster.conf start service <servicename>
>> rg_test test /etc/cluster/cluster.conf status service <servicename>
>> rg_test test /etc/cluster/cluster.conf stop service <servicename>
> 
> rg_test segfaults immediately, no matter what I try.  ;-)
> 
>> (p.s. Say 'Hi' to mikeyp and monogoose for me if you see them. ^.^)
> 
> I know mikeyp, but no clue who monogoose is ...
> 
> -g
> 
> Greg Forte
> gforte at udel.edu
> IT - User Services
> University of Delaware
> 302-831-1982
> Newark, DE
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE



From Alain.Moulle at bull.net  Wed Feb  1 16:03:13 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Wed, 01 Feb 2006 17:03:13 +0100
Subject: [Linux-cluster] CS4/ Heart-beat configuration (contd.)
Message-ID: <43E0DBC1.7070504@bull.net>

Alain Moulle wrote:

>> Hi
>>
>> Is there a way to force the CS4 to use
>> another interface for Heart-Beat than
>> the one linked to the hostname ?
>> And if so, how to ?


>cman_tool join -n <other hostname>
>see cman_tool -h (or the man page) for a list of options.
>-- patrick

Hi
OK I have given this a try, and this does not work :
in fact, it returns : <other_name> not know in cluster.conf
or something likewise.
Knowing that, I have tried another thing :
set a name on eth1 in /etc/hosts such as :
10.0.0.2  nodehb #for heart beat
where is eth0 remains :
10.0.0.1  node (which also is the hostname)
Then in cluster.conf, I have changed all "name" values
from "node" to "nodehb"
And this seems to work : I mean the Heart-beat
frames are effectively running through eth1 now
and I have tried to failover by doing ifdown eth1 on
the first node and the failover has been successful.

So it seems that the Heart-Beat is not linked
to the IF linked to the hostname, but really linked
only to the names we put in cluster.conf.

Could someone confirm that point ?
And eventually tell me which eventual drawback I could
face to adopt this configuration ?

Thanks a lot
Alain Moull?





From pcaulfie at redhat.com  Wed Feb  1 17:20:01 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 01 Feb 2006 17:20:01 +0000
Subject: [Linux-cluster] CS4/ Heart-beat configuration (contd.)
In-Reply-To: <43E0DBC1.7070504@bull.net>
References: <43E0DBC1.7070504@bull.net>
Message-ID: <43E0EDC1.20400@redhat.com>

Alain Moulle wrote:
> Alain Moulle wrote:
> 
>>> Hi
>>>
>>> Is there a way to force the CS4 to use
>>> another interface for Heart-Beat than
>>> the one linked to the hostname ?
>>> And if so, how to ?
> 
> 
>> cman_tool join -n <other hostname>
>> see cman_tool -h (or the man page) for a list of options.
>> -- patrick
> 
> Hi
> OK I have given this a try, and this does not work :
> in fact, it returns : <other_name> not know in cluster.conf
> or something likewise.
> Knowing that, I have tried another thing :
> set a name on eth1 in /etc/hosts such as :
> 10.0.0.2  nodehb #for heart beat
> where is eth0 remains :
> 10.0.0.1  node (which also is the hostname)
> Then in cluster.conf, I have changed all "name" values
> from "node" to "nodehb"
> And this seems to work : I mean the Heart-beat
> frames are effectively running through eth1 now
> and I have tried to failover by doing ifdown eth1 on
> the first node and the failover has been successful.
> 
> So it seems that the Heart-Beat is not linked
> to the IF linked to the hostname, but really linked
> only to the names we put in cluster.conf.

It's related to both, as you found....

> Could someone confirm that point ?
> And eventually tell me which eventual drawback I could
> face to adopt this configuration ?
> 

No, that's almost exactly what you're supposed to do :)
-- 

patrick



From Leonardo.Mello at planejamento.gov.br  Wed Feb  1 17:45:41 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Wed, 1 Feb 2006 15:45:41 -0200
Subject: [Linux-cluster] GFS performance analysis
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255AC4@corp-bsa-mp01.planejamento.gov.br>

We are doing some research here in the brazilian government in distributed mass storage.

one of the scenarios that will be study involve gfs, and we will test gfs performance localy, exported to another machine via gnbd, iscsi and enbd.

we found some weird results, where my disks localy can reach over 300MB/sec using ext3, and the same test using gfs reach only 38MB/sec.  

There is one graphic about this analysis in the page, and results for dbench and iozone.

In this moment we are doing benchmarks using bonnie++ to certify that this weird performance arent relate to the tools we had used.

Other consideration is that turning off hyperthread improved a litle (6-10MB/s) the performance, i believe this is a bug, or one problem in design of gfs.

is there a way to improve the gfs performance ? or this poor performance is all that i can get?  

Leonardo Mello 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060201/c2308633/attachment.htm>

From Leonardo.Mello at planejamento.gov.br  Wed Feb  1 17:50:17 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Wed, 1 Feb 2006 15:50:17 -0200
Subject: [Linux-cluster] RE: GFS performance analysis
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255AC5@corp-bsa-mp01.planejamento.gov.br>

Sorry i forgot the url:

http://guialivre.governoeletronico.gov.br/mediawiki/index.php/TestesGFS

----------------------------------------------------------------------------
We are doing some research here in the brazilian government in distributed mass storage.

one of the scenarios that will be study involve gfs, and we will test gfs performance localy, exported to another machine via gnbd, iscsi and enbd.

we found some weird results, where my disks localy can reach over 300MB/sec using ext3, and the same test using gfs reach only 38MB/sec.  

There is one graphic about this analysis in the page, and results for dbench and iozone.

In this moment we are doing benchmarks using bonnie++ to certify that this weird performance arent relate to the tools we had used.

Other consideration is that turning off hyperthread improved a litle (6-10MB/s) the performance, i believe this is a bug, or one problem in design of gfs.

is there a way to improve the gfs performance ? or this poor performance is all that i can get?  

Leonardo Mello 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060201/c41db0c2/attachment.htm>

From lhh at redhat.com  Wed Feb  1 18:09:26 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 01 Feb 2006 13:09:26 -0500
Subject: [Linux-cluster] verbose clurgmgrd output?
In-Reply-To: <43E0CC41.3090502@leopard.us.udel.edu>
References: <43DE9133.6000409@leopard.us.udel.edu>
	<1138744624.4371.370.camel@ayanami.boston.redhat.com>
	<43E0CA84.9020806@leopard.us.udel.edu>
	<43E0CC41.3090502@leopard.us.udel.edu>
Message-ID: <1138817366.4371.380.camel@ayanami.boston.redhat.com>

On Wed, 2006-02-01 at 09:57 -0500, Greg Forte wrote:
> Never mind, I'm an idiot - none of my scripts support the "status" 
> command, so of course the resource manager is going to fail them ... ugh.
> 
> In my defense, though, I haven't found any documentation describing 
> minimum requirements for a CS-compatible init script.  Does any exist?

Pretty sure any script correctly implementing the LSB-spec for init
scripts should work. Check here:

http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

-- Lon



From lhh at redhat.com  Wed Feb  1 18:29:38 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 01 Feb 2006 13:29:38 -0500
Subject: [Linux-cluster] verbose clurgmgrd output?
In-Reply-To: <43E0CA84.9020806@leopard.us.udel.edu>
References: <43DE9133.6000409@leopard.us.udel.edu>
	<1138744624.4371.370.camel@ayanami.boston.redhat.com>
	<43E0CA84.9020806@leopard.us.udel.edu>
Message-ID: <1138818578.4371.390.camel@ayanami.boston.redhat.com>

On Wed, 2006-02-01 at 09:49 -0500, Greg Forte wrote:

> Hmmm ... tried this, but no apparent change.  This would log into 
> /var/log/messages, correct?  I am running the latest RHEL4 packages.

Yes, and they're not fixed in the current RHEL4 packages, but it will be
in the next update.

> > Also, you can run "clurgmgrd -d", but that requires a restart.
> 
> same result.

/var/log/messages filters out debug info.  Keep using '-d' for now and
add:

daemon.* /var/log/daemon-log 

to /etc/syslog.conf.  This should trap all messages from rgmanager,
including debug info.

> 
> > As an alternative -- you can run fun tests on the service manually:
> > 
> > clusvcadm -d <servicename>  (service must be disabled!)
> > 
> > rg_test test /etc/cluster/cluster.conf start service <servicename>
> > rg_test test /etc/cluster/cluster.conf status service <servicename>
> > rg_test test /etc/cluster/cluster.conf stop service <servicename>
> 
> rg_test segfaults immediately, no matter what I try.  ;-)

... x86_64 ?

See bz #177340.  It's fixed in current CVS in all branches (or should
be), and will also be fixed in the next update of the RHEL4 packages.


> > (p.s. Say 'Hi' to mikeyp and monogoose for me if you see them. ^.^)
> 
> I know mikeyp, but no clue who monogoose is ...

One of mikeyp's minions.

-- Lon



From gforte at leopard.us.udel.edu  Wed Feb  1 18:36:13 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Wed, 01 Feb 2006 13:36:13 -0500
Subject: [Linux-cluster] verbose clurgmgrd output?
In-Reply-To: <1138817366.4371.380.camel@ayanami.boston.redhat.com>
References: <43DE9133.6000409@leopard.us.udel.edu>	<1138744624.4371.370.camel@ayanami.boston.redhat.com>	<43E0CA84.9020806@leopard.us.udel.edu>	<43E0CC41.3090502@leopard.us.udel.edu>
	<1138817366.4371.380.camel@ayanami.boston.redhat.com>
Message-ID: <43E0FF9D.3010704@leopard.us.udel.edu>

Cool, thanks.

Heh, here's an interesting quote:

"In addition to straightforward success, the following situations are 
also to be considered successful: ... running stop on a service already 
stopped or not running ..."

This is exactly what I was complaining/asking about a few days ago - the 
"stock" httpd init script in RHEL4 returns non-zero if you try to stop 
it when it's not running.  This in turn causes clusvcadm -d to fail if 
you try to disable it when it's not running, so you have to manually 
start it "non-clusterfied" before you can disable it in the cluster!  I 
haven't done a survey, but I'm guessing other scripts behave similarly.

But the spec is self-contradictory, because further down it defines an 
error status code for "program is not running" that it says it to be 
used for any action other than 'status'!

So ... is this a bug in the script or a "bug" in the document?

-g

Lon Hohberger wrote:
> On Wed, 2006-02-01 at 09:57 -0500, Greg Forte wrote:
>> Never mind, I'm an idiot - none of my scripts support the "status" 
>> command, so of course the resource manager is going to fail them ... ugh.
>>
>> In my defense, though, I haven't found any documentation describing 
>> minimum requirements for a CS-compatible init script.  Does any exist?
> 
> Pretty sure any script correctly implementing the LSB-spec for init
> scripts should work. Check here:
> 
> http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
> 
> -- Lon
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE



From aconrad.tlv at magic.fr  Wed Feb  1 19:20:00 2006
From: aconrad.tlv at magic.fr (Alexandre CONRAD)
Date: Wed, 01 Feb 2006 20:20:00 +0100
Subject: [Linux-cluster] distributed database on multiple servers/disks
Message-ID: <43E109E0.5090402@magic.fr>

Hello,

I'm thinking about setting up a MySQL database clustered (but not like 
MySQL Cluster which kinda just does mirroring AFAIK).

I want it to be partionned on multiple disks/servers.

Here is my apporch:

              +-------------------+
              | Application (PHP) |
              +-------------------+
                       ^
                       |
                       |
                       v
                +--------------+
                | MySQL Server |
                +--------------+
                       ^
                       |
                       |
                       v
+--------------- logical disk ----------------+
|                     ^                       |
|                     |                       |
|        +----LAN-----+----LAN-----+          |
|        |            |            |          |
|        v            v            v          |
|   +----------+ +----------+ +----------+    |
|   | physical | | physical | | physical |    |
|   | server 1 | | server 2 | | server 3 |    |
|   +----------+ +----------+ +----------+    |
|                                             |
+---------------------------------------------+

The idea is to split the data on 3 physical servers when data is 
written. So each server will have 1/3 of the data on their disk. It 
would work as a RAID 5 system.

In case of full disk space, you would add a new physical server (node) 
and the data would be redistributed to all servers, each getting 1/4 of 
the data.

In case of server failure, because of distributed parity (RAID5), it 
could keep going.

Or alternativly think about a RAID0+1 configuration.

Do you have any idea if this could be achived using LVM, RH Cluster 
Suite or any other clustering / partitioning system ?

Regards,
-- 
Alexandre CONRAD - TLV
Research & Development
tel : +33 1 30 80 55 05
fax : +33 1 30 80 55 06
6, rue de la plaine
78860 - SAINT NOM LA BRETECHE
FRANCE



From gkapitany at rogers.com  Wed Feb  1 19:57:56 2006
From: gkapitany at rogers.com (GABRIEL KAPITANY)
Date: Wed, 1 Feb 2006 14:57:56 -0500 (EST)
Subject: [Linux-cluster] distributed database on multiple servers/disks
In-Reply-To: <43E109E0.5090402@magic.fr>
Message-ID: <20060201195756.10877.qmail@web88106.mail.re2.yahoo.com>

Hi,

another approach, not clustering, which might give you
a distributed database:
https://forge.continuent.org/projects/sequoia/

Gabriel
--- Alexandre CONRAD <aconrad.tlv at magic.fr> wrote:

> Hello,
> 
> I'm thinking about setting up a MySQL database
> clustered (but not like 
> MySQL Cluster which kinda just does mirroring
> AFAIK).
> 
> I want it to be partionned on multiple
> disks/servers.
> 
> Here is my apporch:
> 
>               +-------------------+
>               | Application (PHP) |
>               +-------------------+
>                        ^
>                        |
>                        |
>                        v
>                 +--------------+
>                 | MySQL Server |
>                 +--------------+
>                        ^
>                        |
>                        |
>                        v
> +--------------- logical disk ----------------+
> |                     ^                       |
> |                     |                       |
> |        +----LAN-----+----LAN-----+          |
> |        |            |            |          |
> |        v            v            v          |
> |   +----------+ +----------+ +----------+    |
> |   | physical | | physical | | physical |    |
> |   | server 1 | | server 2 | | server 3 |    |
> |   +----------+ +----------+ +----------+    |
> |                                             |
> +---------------------------------------------+
> 
> The idea is to split the data on 3 physical servers
> when data is 
> written. So each server will have 1/3 of the data on
> their disk. It 
> would work as a RAID 5 system.
> 
> In case of full disk space, you would add a new
> physical server (node) 
> and the data would be redistributed to all servers,
> each getting 1/4 of 
> the data.
> 
> In case of server failure, because of distributed
> parity (RAID5), it 
> could keep going.
> 
> Or alternativly think about a RAID0+1 configuration.
> 
> Do you have any idea if this could be achived using
> LVM, RH Cluster 
> Suite or any other clustering / partitioning system
> ?
> 
> Regards,
> -- 
> Alexandre CONRAD - TLV
> Research & Development
> tel : +33 1 30 80 55 05
> fax : +33 1 30 80 55 06
> 6, rue de la plaine
> 78860 - SAINT NOM LA BRETECHE
> FRANCE
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From Anthony.Assi at irisa.fr  Thu Feb  2 13:14:37 2006
From: Anthony.Assi at irisa.fr (Anthony Assi)
Date: Thu, 02 Feb 2006 14:14:37 +0100
Subject: [Linux-cluster] RE: GFS performance analysis
In-Reply-To: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255AC5@corp-bsa-mp01.planejamento.gov.br>
References: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255AC5@corp-bsa-mp01.planejamento.gov.br>
Message-ID: <43E205BD.2070301@irisa.fr>

Leonardo Rodrigues de Mello wrote:

> Sorry i forgot the url:
>
> http://guialivre.governoeletronico.gov.br/mediawiki/index.php/TestesGFS
>
> ----------------------------------------------------------------------------
> We are doing some research here in the brazilian government in 
> distributed mass storage.
>
> one of the scenarios that will be study involve gfs, and we will test 
> gfs performance localy, exported to another machine via gnbd, iscsi 
> and enbd.
>
> we found some weird results, where my disks localy can reach over 
> 300MB/sec using ext3, and the same test using gfs reach only 38MB/sec. 
>
> There is one graphic about this analysis in the page, and results for 
> dbench and iozone.
>
> In this moment we are doing benchmarks using bonnie++ to certify that 
> this weird performance arent relate to the tools we had used.
>
> Other consideration is that turning off hyperthread improved a litle 
> (6-10MB/s) the performance, i believe this is a bug, or one problem in 
> design of gfs.
>
> is there a way to improve the gfs performance ? or this poor 
> performance is all that i can get? 
>
> Leonardo Mello
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
Dear Leonardo,

We are facing the same problem on our cluster,
We are actually using the GULM lock system, and are hoping that by 
switching to the DLM locking system, we will resolve this problem.
Wich locking method are you using for GFS?

I will be gratefull, if you could keep me informed of your upcomming 
performance/solutions.

Regards,

-- 
Anthony Assi
DBA/System Administrator
Bio-Informatics Platform, Symbiose Team
IRISA - INRIA, Rennes, France
Tel: +33 2 99 84 71 58
http://genouest.org



From sp at linworx.cz  Thu Feb  2 13:58:17 2006
From: sp at linworx.cz (=?ISO-8859-2?Q?Stanislav_Pol=E1=B9ek?=)
Date: Thu, 02 Feb 2006 14:58:17 +0100
Subject: [Linux-cluster] two nodes -> tree nodes
Message-ID: <43E20FF9.4060406@linworx.cz>

Hi everybody. I would like to add third node to the running cluster
without restart of the cluster services. But when I try to join the
cluster with the third node, the other two refuse the requests with the
message "CMAN: join request from node requested, exceeds two node
limit". My cluster.conf does not explicitly state the two_node
parameter, but cman seems to run in that mode. What would be the right
way to add the third node without restart of the cluster services on the
other two?

Thank you very much

Stanislav

-- 
#--#--#--#--#--#--#--#--#--#--
Stanislav Polasek
RHCE #807302906006864
LinWorx s.r.o.
sp at linworx.cz / www.linworx.cz
#--#--#--#--#--#--#--#--#--#--



From leonardo.mello at planejamento.gov.br  Thu Feb  2 14:37:11 2006
From: leonardo.mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Thu, 02 Feb 2006 12:37:11 -0200
Subject: [Linux-cluster] RE: GFS performance analysis
In-Reply-To: <43E205BD.2070301@irisa.fr>
References: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255AC5@corp-bsa-mp01.planejamen
	to.gov.br> <43E205BD.2070301@irisa.fr>
Message-ID: <1138891031.13467.10.camel@mp>

Dear Anthony,

	The tests that we report in the last message, were done localy in the
machine (without gnbd), and without lock (just one machine) using module
lock_nolock. We believe the performance problem we faced is about gfs
structure, not lock, but now, we are updating the webpage and results
with GNBD and GFS will be available soon, may be today.

	just to remember the page:
http://guialivre.governoeletronico.gov.br/seminario/index.php/TestesGFS

	Our objective here is to implement one Distributed Raid in a network
enviroment using gfs as the top filesystem. I had done that with success
about two years ago using ext3 at top filesystem. It was one distributed
raid in a network enviroment of 1.5 TB distributed in 4 machine servers
and 1 client mounting that raid. 

	Lets see how gfs performs, in the next tests

cya
-- 
.''`.   Leonardo Rodrigues de Mello
: :' :  Coordenador de Projetos em Cluster e Grid
`. `'   DSI/SLTI/MP
  `-    55 61 3313 1329

> Dear Leonardo,
> 
> We are facing the same problem on our cluster,
> We are actually using the GULM lock system, and are hoping that by 
> switching to the DLM locking system, we will resolve this problem.
> Wich locking method are you using for GFS?
> 
> I will be gratefull, if you could keep me informed of your upcomming 
> performance/solutions.
> 
> Regards,
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060202/a0eb52dd/attachment.sig>

From Anthony.Assi at irisa.fr  Thu Feb  2 15:36:20 2006
From: Anthony.Assi at irisa.fr (Anthony Assi)
Date: Thu, 02 Feb 2006 16:36:20 +0100
Subject: [Linux-cluster] /etc/cluster/cluster.conf
Message-ID: <43E226F4.7030001@irisa.fr>


 I am debugging a Performance problem we are having with our Cluster,
We have a 32 nodes cluster, using a GFS shared Volume,
I would like to know whether this is a good configuration of the 
"cluster.conf" File:



[root at genocluster-data symbiose]# more /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="17" name="genouest">
        <fence_daemon clean_start="0" post_fail_delay="5" 
post_join_delay="15"/>
        <clusternodes>

                <clusternode name="genodata-data" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="manual" 
nodename="genodata-data"/>
                                </method>
                        </fence>
                </clusternode>

                <clusternode name="genocluster-data" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="gnbd" 
nodename="genocluster-data"/>
                                </method>
                                <method name="2">
                                        <device name="manual" 
nodename="genocluster-data"/>
                                </method>
                        </fence>
                </clusternode>


                <clusternode name="genouest0-data" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="gnbd" 
nodename="genouest0-data"/>
                                </method>
                                <method name="2">
                                        <device name="manual" 
nodename="genouest0-data"/>
                                </method>
                        </fence>
                </clusternode>

.............................................................................Copy/Paste  
for : genouest1 till 
genouest31.............................................................................................
         
                <clusternode name="genouest31-data" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="gnbd" 
nodename="genouest31-data"/>
                                </method>
                                <method name="2">
                                        <device name="manual" 
nodename="genouest31-data"/>
                                </method>
                        </fence>
                </clusternode>


        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_gnbd" name="gnbd" 
servers="genodata-data"/>
                <fencedevice agent="fence_manual" name="manual"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
                <service name="GFS">
                        <clusterfs device="/dev/sdc" force_unmount="0" 
fstype="gfs" mountpoint="/bdspecs" name="bdspecs" options="num_glockd=32"/>
                        <clusterfs device="/dev/sdd" force_unmount="0" 
fstype="gfs" mountpoint="/home/genouest" name="genouest" 
options="num_glockd=32"/>
                        <clusterfs device="/dev/sde" force_unmount="0" 
fstype="gfs" mountpoint="/home/symbiose" name="symbiose" 
options="num_glockd=32"/>
                        <clusterfs device="/dev/sdf" force_unmount="0" 
fstype="gfs" mountpoint="/home/irisa" name="irisa" options="num_glockd=32"/>
                        <clusterfs device="/dev/sdg" force_unmount="0" 
fstype="gfs" mountpoint="/db" name="db" options="num_glockd=32"/>
                        <clusterfs device="/dev/sdh" force_unmount="0" 
fstype="gfs" mountpoint="/index" name="index" options="num_glockd=32"/>
                </service>
        </rm>
        <gulm>
                <lockserver name="genodata-data"/>
                <lockserver name="genocluster-data"/>
                <lockserver name="genouest0-data"/>
                <lockserver name="genouest1-data"/>
                <lockserver name="genouest2-data"/>
        </gulm>
</cluster>

-- 
Anthony Assi
DBA/System Administrator
INRIA: French National Institute for Research in Computer Science and Control
IRISA, Rennes, France
Tel: +33 2 99 84 71 58
http://www.irisa.fr/symbiose/index-eng.htm



From pcaulfie at redhat.com  Thu Feb  2 16:15:14 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 02 Feb 2006 16:15:14 +0000
Subject: [Linux-cluster] two nodes -> tree nodes
In-Reply-To: <43E20FF9.4060406@linworx.cz>
References: <43E20FF9.4060406@linworx.cz>
Message-ID: <43E23012.5010405@redhat.com>

Stanislav Pol??ek wrote:
> Hi everybody. I would like to add third node to the running cluster
> without restart of the cluster services. But when I try to join the
> cluster with the third node, the other two refuse the requests with the
> message "CMAN: join request from node requested, exceeds two node
> limit". My cluster.conf does not explicitly state the two_node
> parameter, but cman seems to run in that mode. What would be the right
> way to add the third node without restart of the cluster services on the
> other two?
> 


two_node="1" Must have been in the cluster.conf file when the cluster was
formed - that's the only way the flag inside CMAN can get set.

Once you have a 2 node cluster the only way to add a node is to remove all
nodes from the cluster and join them again without two_node set.

>From 3 node upwards you can add nodes as you please - it's just the transition
from 2 to 3 that needs a restart because 2 nodes is a special case.

-- 

patrick



From brilong at cisco.com  Thu Feb  2 16:40:34 2006
From: brilong at cisco.com (Brian Long)
Date: Thu, 02 Feb 2006 11:40:34 -0500
Subject: [Linux-cluster] Errata Mailing List?
Message-ID: <1138898434.4441.14.camel@brilong-lnx>

Hello,

I just joined the list and was wondering if there is a mailman list
similar to enterprise-watch-list that gives errata notices for GFS and
RHCS.

Thanks.

/Brian/
-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From epeelea at gmail.com  Thu Feb  2 21:03:44 2006
From: epeelea at gmail.com (Daniel EPEE LEA)
Date: Thu, 2 Feb 2006 22:03:44 +0100
Subject: [Linux-cluster] Errata Mailing List?
In-Reply-To: <1138898434.4441.14.camel@brilong-lnx>
References: <1138898434.4441.14.camel@brilong-lnx>
Message-ID: <df22854e0602021303m2208d694g4cb3799b3126a660@mail.gmail.com>

Hi Brian,
I do not know about a list yet, but i know that if you have GFS
licenses or RHEL Entitlements you will receive daily errata notices
through your RHN account. That is how I get them.

Hope this helps

Daniel

On 2/2/06, Brian Long <brilong at cisco.com> wrote:
> Hello,
>
> I just joined the list and was wondering if there is a mailman list
> similar to enterprise-watch-list that gives errata notices for GFS and
> RHCS.
>
> Thanks.
>
> /Brian/
> --
>        Brian Long                      |         |           |
>        IT Data Center Systems          |       .|||.       .|||.
>        Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
>        Phone: (919) 392-7363           |   C i s c o   S y s t e m s
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From gforte at leopard.us.udel.edu  Thu Feb  2 23:23:57 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Thu, 02 Feb 2006 18:23:57 -0500
Subject: [Linux-cluster] dependencies between services
Message-ID: <43E2948D.3060108@leopard.us.udel.edu>

Is it possible to set up dependencies between cluster services?  That 
is, I have services A, B, C, and D.  B, C, D can't run unless A is 
running, but B, C, and D are all independent of each other and I want to 
be able to control them individually, i.e. be able to start/stop (or 
rather, enable/disable) each without affecting the others.  I know I 
could define them as dependent resources all in the same service, but 
then I can't have that independence between B, C, and D ... unless I'm 
missing something.

Of course, the fall-back approach is that I just define each as a 
separate service and make the init scripts for B, C, and D check to see 
that A is running (probably via a clustat | grep "A's name" kludge), but 
I'm hoping there's a better way, since this is ugly as hell and will 
only work if the service consists entirely of init script resources.

-g

Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE



From brilong at cisco.com  Fri Feb  3 13:05:11 2006
From: brilong at cisco.com (Brian Long)
Date: Fri, 03 Feb 2006 08:05:11 -0500
Subject: [Linux-cluster] Errata Mailing List?
In-Reply-To: <df22854e0602021303m2208d694g4cb3799b3126a660@mail.gmail.com>
References: <1138898434.4441.14.camel@brilong-lnx>
	<df22854e0602021303m2208d694g4cb3799b3126a660@mail.gmail.com>
Message-ID: <1138971911.4472.12.camel@brilong-lnx>

On Thu, 2006-02-02 at 22:03 +0100, Daniel EPEE LEA wrote:
> Hi Brian,
> I do not know about a list yet, but i know that if you have GFS
> licenses or RHEL Entitlements you will receive daily errata notices
> through your RHN account. That is how I get them.
> 
> Hope this helps
> 
> Daniel

Daniel,

I had forgotten about that since we've had it turned off for so long and
prefer to stay subscribed to enterprise-watch-list.  RHN only allows me
to enable/disable email notifications; I wish I could select just
certain channels.  Oh well, I'll re-enable for now and work with RH on
improvements.

Thanks.

/Brian/

> 
> On 2/2/06, Brian Long <brilong at cisco.com> wrote:
> > Hello,
> >
> > I just joined the list and was wondering if there is a mailman list
> > similar to enterprise-watch-list that gives errata notices for GFS and
> > RHCS.
> >
> > Thanks.
> >
> > /Brian/
> > --
> >        Brian Long                      |         |           |
> >        IT Data Center Systems          |       .|||.       .|||.
> >        Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
> >        Phone: (919) 392-7363           |   C i s c o   S y s t e m s
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From lhh at redhat.com  Fri Feb  3 15:18:28 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 03 Feb 2006 10:18:28 -0500
Subject: [Linux-cluster] /etc/cluster/cluster.conf
In-Reply-To: <43E226F4.7030001@irisa.fr>
References: <43E226F4.7030001@irisa.fr>
Message-ID: <1138979908.5992.52.camel@ayanami.boston.redhat.com>

On Thu, 2006-02-02 at 16:36 +0100, Anthony Assi wrote:
>  I am debugging a Performance problem we are having with our Cluster,
> We have a 32 nodes cluster, using a GFS shared Volume,
> I would like to know whether this is a good configuration of the 
> "cluster.conf" File:

>         <rm>
>                 <failoverdomains/>
>                 <resources/>
>                 <service name="GFS">
>                         <clusterfs device="/dev/sdc" force_unmount="0" 
> fstype="gfs" mountpoint="/bdspecs" name="bdspecs" options="num_glockd=32"/>
>                         <clusterfs device="/dev/sdd" force_unmount="0" 
> fstype="gfs" mountpoint="/home/genouest" name="genouest" 
> options="num_glockd=32"/>
>                         <clusterfs device="/dev/sde" force_unmount="0" 
> fstype="gfs" mountpoint="/home/symbiose" name="symbiose" 
> options="num_glockd=32"/>
>                         <clusterfs device="/dev/sdf" force_unmount="0" 
> fstype="gfs" mountpoint="/home/irisa" name="irisa" options="num_glockd=32"/>
>                         <clusterfs device="/dev/sdg" force_unmount="0" 
> fstype="gfs" mountpoint="/db" name="db" options="num_glockd=32"/>
>                         <clusterfs device="/dev/sdh" force_unmount="0" 
> fstype="gfs" mountpoint="/index" name="index" options="num_glockd=32"/>
>                 </service>
>         </rm>


You really don't need rgmanager in this case.  The above rgmanager
configuration is just about equivalent to putting the GFS volumes
in /etc/fstab

-- Lon



From lhh at redhat.com  Fri Feb  3 15:39:47 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 03 Feb 2006 10:39:47 -0500
Subject: [Linux-cluster] dependencies between services
In-Reply-To: <43E2948D.3060108@leopard.us.udel.edu>
References: <43E2948D.3060108@leopard.us.udel.edu>
Message-ID: <1138981187.5992.62.camel@ayanami.boston.redhat.com>

On Thu, 2006-02-02 at 18:23 -0500, Greg Forte wrote:
> Is it possible to set up dependencies between cluster services?  That 
> is, I have services A, B, C, and D.  B, C, D can't run unless A is 
> running, but B, C, and D are all independent of each other and I want to 
> be able to control them individually, i.e. be able to start/stop (or 
> rather, enable/disable) each without affecting the others.  I know I 
> could define them as dependent resources all in the same service, but 
> then I can't have that independence between B, C, and D ... unless I'm 
> missing something.

Not at the moment, but it should not be a difficult thing to add.

Could you file a bugzilla about it?

-- Lon




From omer at faruk.net  Sat Feb  4 20:21:22 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Sat, 4 Feb 2006 22:21:22 +0200 (EET)
Subject: [Linux-cluster] two nodes -> tree nodes
In-Reply-To: <43E23012.5010405@redhat.com>
References: <43E20FF9.4060406@linworx.cz> <43E23012.5010405@redhat.com>
Message-ID: <62448.85.103.165.73.1139084482.squirrel@85.103.165.73>

Then is it possible to create a cluster with 2 modes without setting
two_node="1" ? Thus it will enable us not to reboot our servers when
adding the third node.


> Stanislav Pol??ek wrote:
>> Hi everybody. I would like to add third node to the running cluster
>> without restart of the cluster services. But when I try to join the
>> cluster with the third node, the other two refuse the requests with the
>> message "CMAN: join request from node requested, exceeds two node
>> limit". My cluster.conf does not explicitly state the two_node
>> parameter, but cman seems to run in that mode. What would be the right
>> way to add the third node without restart of the cluster services on the
>> other two?
>>
>
>
> two_node="1" Must have been in the cluster.conf file when the cluster was
> formed - that's the only way the flag inside CMAN can get set.
>
> Once you have a 2 node cluster the only way to add a node is to remove all
> nodes from the cluster and join them again without two_node set.
>
>>From 3 node upwards you can add nodes as you please - it's just the
>> transition
> from 2 to 3 that needs a restart because 2 nodes is a special case.
>
> --
>
> patrick
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Omer Faruk Sen
http://www.faruk.net



From omer at faruk.net  Sat Feb  4 20:25:04 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Sat, 4 Feb 2006 22:25:04 +0200 (EET)
Subject: [Linux-cluster] fencing and node relation?
Message-ID: <52713.85.103.165.73.1139084704.squirrel@85.103.165.73>

Hi,

There is one issue that I don't grasp with fencing devices.. I will a 2
node cluster with dl140 (using fence_ipmilan) but what I don't understand
is how will I relate a cluster node with the fencing device that this
cluster node connected.


For example I have node1 and node2

node1: 192.168.1.1
node2: 192.168.1.2
node1_ipmi: 192.168.1.253
node2_ipmi: 192.168.1.254

using system-config-cluster I can't see how to relate node1 and node1_ipmi
so in the event of failure of node1 node2 can send node1_ipmi reboot
message




-- 
Omer Faruk Sen
http://www.faruk.net



From omer at faruk.net  Sat Feb  4 21:00:57 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Sat, 4 Feb 2006 23:00:57 +0200 (EET)
Subject: [Linux-cluster] fencing and node relation?
In-Reply-To: <52713.85.103.165.73.1139084704.squirrel@85.103.165.73>
References: <52713.85.103.165.73.1139084704.squirrel@85.103.165.73>
Message-ID: <59439.85.103.165.73.1139086857.squirrel@85.103.165.73>

I have found my own answer:

http://mirror.centos.org/centos/4/docs/html/rh-cs-en-4/s1-config-powercontroller.html#FIG-SOFT-PWRCTRLR

Select the !!! member for which you want to configure a power controller
connection !!! and click Manage Fencing For This Node. The Fence
Configuration dialog box is displayed as shown in Figure 3-8. Also taking
a look to usage.txt:

<clusternode name="nd02">
	<fence>
		<method name="single">
			<device name="apc1" port="2"/>
		</method>
	</fence>
</clusternode>

<clusternode name="nd03">
	<fence>
		<method name="single">
			<device name="human" nodename="nd03"/>
		</method>
	</fence>
</clusternode>


it shows that fencing devices are IN the clusternode knob..




> Hi,
>
> There is one issue that I don't grasp with fencing devices.. I will a 2
> node cluster with dl140 (using fence_ipmilan) but what I don't understand
> is how will I relate a cluster node with the fencing device that this
> cluster node connected.
>
>
> For example I have node1 and node2
>
> node1: 192.168.1.1
> node2: 192.168.1.2
> node1_ipmi: 192.168.1.253
> node2_ipmi: 192.168.1.254
>
> using system-config-cluster I can't see how to relate node1 and node1_ipmi
> so in the event of failure of node1 node2 can send node1_ipmi reboot
> message
>
>
>
>
> --
> Omer Faruk Sen
> http://www.faruk.net
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Omer Faruk Sen
http://www.faruk.net



From tristram at ubernet.co.nz  Sun Feb  5 09:22:39 2006
From: tristram at ubernet.co.nz (Tristram Cheer)
Date: Sun, 05 Feb 2006 22:22:39 +1300
Subject: [Linux-cluster] RGManager Hanging on shutdown
Message-ID: <43E5C3DF.2000601@ubernet.co.nz>

Hi folks,

I'm having issues with the shutdown of my cluster, background on our 
cluster is this.

We have 4 Compaq 8500r's with 8 x Xeon 800mhz with 3gb of Ram in each, 
we are using Xen 3.0.1 to run 16 VM's each with shared access via GFS to 
a GNBD Raid5 Export.

The last remaining issue i have to work out before we look at Testing 
and moving to production is this, /etc/init.d/rgmanger hangs when 
shutting down any node in the cluster, all i get on console is "Waiting 
for cluster services to stop", i have to kill the process to shutdown 
the system, this happens on our VM or on Real hardware so i dont think 
the issue lies with Xen.
Services running on each node are these

root at edward:~# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[3 2 1 4 5 6 7 8 9]

DLM Lock Space:  "shared"                            3   3 run       -
[3 2 9]

GFS Mount Group: "shared"                            4   4 run       -
[3 2 9]

User:            "usrm::manager"                     2   5 run       -
[3 2 1 4 5 6 7 8 9]

I cant figure out why this is happening, here is an strace from the shutdown
Any help or tips to fixing this would be great

root at edward:~# strace /etc/init.d/rgmanager stop
execve("/etc/init.d/rgmanager", ["/etc/init.d/rgmanager", "stop"], [/* 
17 vars */]) = 0
uname({sys="Linux", node="edward", ...}) = 0
brk(0)                                  = 0x80eb000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0xb7f02000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or 
directory)
old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0xb7f00000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or 
directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=8432, ...}) = 0
old_mmap(NULL, 8432, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7efd000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or 
directory)
open("/lib/libncurses.so.5", O_RDONLY)  = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\342"..., 
512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=260524, ...}) = 0
old_mmap(NULL, 265868, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 
3, 0) = 0xb7ebc000
old_mmap(0xb7ef4000, 36864, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x37000) = 0xb7ef4000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or 
directory)
open("/lib/libdl.so.2", O_RDONLY)       = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\v\0"..., 
512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=8016, ...}) = 0
old_mmap(NULL, 10828, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 
0) = 0xb7eb9000
old_mmap(0xb7ebb000, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0xb7ebb000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or 
directory)
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\306S\1"..., 
512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=1131932, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0xb7eb8000
old_mmap(NULL, 1141908, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 
3, 0) = 0xb7da1000
old_mmap(0xb7eb2000, 16384, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x110000) = 0xb7eb2000
old_mmap(0xb7eb6000, 7316, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7eb6000
close(3)                                = 0
munmap(0xb7efd000, 8432)                = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/dev/tty", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3
close(3)                                = 0
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=290448, ...}) = 0
mmap2(NULL, 290448, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7d5a000
close(3)                                = 0
brk(0)                                  = 0x80eb000
brk(0x80ec000)                          = 0x80ec000
brk(0x80ed000)                          = 0x80ed000
brk(0x80ee000)                          = 0x80ee000
getuid32()                              = 0
getgid32()                              = 0
geteuid32()                             = 0
getegid32()                             = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
time(NULL)                              = 1139130793
brk(0x80ef000)                          = 0x80ef000
open("/etc/mtab", O_RDONLY)             = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=281, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0xb7d59000
read(3, "/dev/hda1 / ext3 rw 0 0\nproc /pr"..., 4096) = 281
close(3)                                = 0
munmap(0xb7d59000, 4096)                = 0
open("/proc/meminfo", O_RDONLY)         = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0xb7d59000
read(3, "MemTotal:        65728 kB\nMemFre"..., 1024) = 598
close(3)                                = 0
munmap(0xb7d59000, 4096)                = 0
brk(0x80f0000)                          = 0x80f0000
rt_sigaction(SIGCHLD, {SIG_DFL}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGCHLD, {SIG_DFL}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
uname({sys="Linux", node="edward", ...}) = 0
brk(0x80f1000)                          = 0x80f1000
stat64("/root", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
getpid()                                = 23624
open("/usr/lib/gconv/gconv-modules.cache", O_RDONLY) = -1 ENOENT (No 
such file or directory)
open("/usr/lib/gconv/gconv-modules", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=45568, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0xb7d59000
read(3, "# GNU libc iconv configuration.\n"..., 4096) = 4096
brk(0x80f2000)                          = 0x80f2000
read(3, "lias\tJS//\t\t\tJUS_I.B1.002//\nalias"..., 4096) = 4096
brk(0x80f3000)                          = 0x80f3000
brk(0x80f4000)                          = 0x80f4000
brk(0x80f5000)                          = 0x80f5000
read(3, "ule\tINTERNAL\t\tISO-8859-3//\t\tISO8"..., 4096) = 4096
brk(0x80f6000)                          = 0x80f6000
brk(0x80f7000)                          = 0x80f7000
brk(0x80f8000)                          = 0x80f8000
read(3, "lias\tISO-IR-199//\t\tISO-8859-14//"..., 4096) = 4096
brk(0x80f9000)                          = 0x80f9000
brk(0x80fa000)                          = 0x80fa000
read(3, "\t\tto\t\t\tmodule\t\tcost\nalias\tCSEBCD"..., 4096) = 4096
brk(0x80fb000)                          = 0x80fb000
brk(0x80fc000)                          = 0x80fc000
brk(0x80fd000)                          = 0x80fd000
read(3, "ule\t\tcost\nalias\tCP284//\t\t\tIBM284"..., 4096) = 4096
brk(0x80fe000)                          = 0x80fe000
brk(0x80ff000)                          = 0x80ff000
brk(0x8100000)                          = 0x8100000
read(3, "lias\tCP864//\t\t\tIBM864//\nalias\t86"..., 4096) = 4096
brk(0x8101000)                          = 0x8101000
brk(0x8102000)                          = 0x8102000
brk(0x8103000)                          = 0x8103000
read(3, "module\tIBM937//\t\tINTERNAL\t\tIBM93"..., 4096) = 4096
brk(0x8104000)                          = 0x8104000
brk(0x8105000)                          = 0x8105000
brk(0x8106000)                          = 0x8106000
read(3, "\tEUC-JP//\nalias\tUJIS//\t\t\tEUC-JP/"..., 4096) = 4096
brk(0x8107000)                          = 0x8107000
brk(0x8108000)                          = 0x8108000
read(3, "module\t\tcost\nalias\tISO-IR-143//\t"..., 4096) = 4096
brk(0x8109000)                          = 0x8109000
brk(0x810a000)                          = 0x810a000
brk(0x810b000)                          = 0x810b000
read(3, "-BOX//\nmodule\tISO_10367-BOX//\t\tI"..., 4096) = 4096
brk(0x810c000)                          = 0x810c000
brk(0x810d000)                          = 0x810d000
brk(0x810e000)                          = 0x810e000
read(3, "module\tINTERNAL\t\tEUC-JISX0213//\t"..., 4096) = 512
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0xb7d59000, 4096)                = 0
brk(0x810f000)                          = 0x810f000
open("/usr/lib/gconv/ISO8859-1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`\4\0\000"..., 
512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=5576, ...}) = 0
brk(0x8110000)                          = 0x8110000
old_mmap(NULL, 4368, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 
0) = 0xb7d58000
old_mmap(0xb7d59000, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0xb7d59000
close(3)                                = 0
getppid()                               = 23623
getpgrp()                               = 23623
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/etc/init.d/rgmanager", O_RDONLY|O_LARGEFILE) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbff1698c) = -1 ENOTTY 
(Inappropriate ioctl for device)
_llseek(3, 0, [0], SEEK_CUR)            = 0
read(3, "#!/bin/sh\n\nPATH=/usr/local/sbin:"..., 80) = 80
_llseek(3, 0, [0], SEEK_SET)            = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
dup2(3, 255)                            = 255
close(3)                                = 0
fcntl64(255, F_SETFD, FD_CLOEXEC)       = 0
fcntl64(255, F_GETFL)                   = 0x8000 (flags 
O_RDONLY|O_LARGEFILE)
fstat64(255, {st_mode=S_IFREG|0755, st_size=1319, ...}) = 0
_llseek(255, 0, [0], SEEK_CUR)          = 0
brk(0x8111000)                          = 0x8111000
brk(0x8112000)                          = 0x8112000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "#!/bin/sh\n\nPATH=/usr/local/sbin:"..., 1319) = 1319
brk(0x8113000)                          = 0x8113000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
stat64("/etc/default/rgmanager", {st_mode=S_IFREG|0644, st_size=19, 
...}) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("/etc/default/rgmanager", {st_mode=S_IFREG|0644, st_size=19, 
...}) = 0
open("/etc/default/rgmanager", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=19, ...}) = 0
read(3, "# RGMGR_OPTIONS=\"\"\n", 19)   = 19
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
brk(0x8114000)                          = 0x8114000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
brk(0x8115000)                          = 0x8115000
brk(0x8116000)                          = 0x8116000
brk(0x8117000)                          = 0x8117000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/usr/local/sbin/clulog", 0xbff14088) = -1 ENOENT (No such file 
or directory)
stat64("/usr/local/bin/clulog", 0xbff14088) = -1 ENOENT (No such file or 
directory)
stat64("/sbin/clulog", 0xbff14088)      = -1 ENOENT (No such file or 
directory)
stat64("/bin/clulog", 0xbff14088)       = -1 ENOENT (No such file or 
directory)
stat64("/usr/sbin/clulog", {st_mode=S_IFREG|0755, st_size=6632, ...}) = 0
stat64("/usr/sbin/clulog", {st_mode=S_IFREG|0755, st_size=6632, ...}) = 0
brk(0x8118000)                          = 0x8118000
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23625
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23625
waitpid(-1, 0xbff13dbc, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
write(1, "Stopping cluster service manager"..., 34Stopping cluster 
service manager:
) = 34
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/usr/local/sbin/start-stop-daemon", 0xbff15128) = -1 ENOENT (No 
such file or directory)
stat64("/usr/local/bin/start-stop-daemon", 0xbff15128) = -1 ENOENT (No 
such file or directory)
stat64("/sbin/start-stop-daemon", {st_mode=S_IFREG|0755, st_size=18520, 
...}) = 0
stat64("/sbin/start-stop-daemon", {st_mode=S_IFREG|0755, st_size=18520, 
...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23626
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23626
waitpid(-1, 0xbff14e5c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23627
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23627
waitpid(-1, 0xbff1412c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(1, "Waiting for services to stop: ", 30Waiting for services to 
stop: ) = 30
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23628
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23628
waitpid(-1, 0xbff1374c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/usr/local/sbin/sleep", 0xbff13ce8) = -1 ENOENT (No such file or 
directory)
stat64("/usr/local/bin/sleep", 0xbff13ce8) = -1 ENOENT (No such file or 
directory)
stat64("/sbin/sleep", 0xbff13ce8)       = -1 ENOENT (No such file or 
directory)
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23629
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 23629
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbff13a9c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23630
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23630
waitpid(-1, 0xbff1374c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23631
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 23631
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbff13a9c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23632
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23632
waitpid(-1, 0xbff1374c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23633
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 23633
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbff13a9c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23637
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23637
waitpid(-1, 0xbff1374c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23638
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 23638
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbff13a9c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23639
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23639
waitpid(-1, 0xbff1374c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23640
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 23640
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbff13a9c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23641
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23641
waitpid(-1, 0xbff1374c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23642
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 23642
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbff13a9c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23643
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23643
waitpid(-1, 0xbff1374c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23644
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 23644
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbff13a9c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23645
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 23645
waitpid(-1, 0xbff1374c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGCHLD, {0x8078e75, [], 0}, {0x8078e75, [], 0}, 8) = 0
close(4)                                = 0
read(3, "4519 4478\n", 128)             = 10
read(3, "", 128)                        = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x8077b64, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("/bin/sleep", {st_mode=S_IFREG|0755, st_size=13920, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
fork()                                  = 23646
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8077b64, [], 0}, {SIG_DFL}, 8) = 0
waitpid(-1,  <unfinished ...>


Tristram Cheer



From omer at faruk.net  Sun Feb  5 17:41:27 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Sun, 5 Feb 2006 19:41:27 +0200 (EET)
Subject: [Linux-cluster] <crit> #12: RG xxcluster failed to stop;
	intervention required
Message-ID: <64526.85.103.165.73.1139161287.squirrel@85.103.165.73>

Hi,

I have done a simple test but redhat-cluster can't survive it. I have a 2
node cluster I have edited httpd.conf so it can't start (adding a few
characters to httpd.conf) and on node2 I have made a

/usr/sbin/clusvcadm -r cluster -m clu2

it move resources to clu2 perfectly but when I corrected httpd.conf on
clu1 and issue

/usr/sbin/clusvcadm -r ggcluster -m clu1

on node2 I get following errors:

Feb  5 16:25:19 clu1 clurgmgrd[2256]: <notice> stop on script "apache"
returned 1 (generic error)
Feb  5 16:25:19 clu1 clurgmgrd[2256]: <crit> #12: RG xxcluster failed to
stop; intervention required
Feb  5 16:25:19 clu1 clurgmgrd[2256]: <notice> Service xxcluster is failed

It shows that httpd service can't get started but when I manually start
apache it starts without problem ..

Can someone tell me what I am missing. It is a simple test and redhat
cluster didn't pass it.

I use  2 dl140 using fence_ipmilan. I can manually stop start and reboot
server using fence_ipmilan command. And here is my cluster.conf:

<?xml version="1.0"?>
<cluster config_version="5" name="xx">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="clu1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="clu1-ilo"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="clu2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="clu2-ilo"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" ipaddr="192.168.1.98"
login="admin" name="clu1-ilo" passwd="admin"/>
                <fencedevice agent="fence_ipmilan" ipaddr="192.168.1.99"
login="admin" name="clu2-ilo" passwd="admin"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="xx-cluster" ordered="0"
restricted="1">
                                <failoverdomainnode name="clu1"
priority="1"/>
                                <failoverdomainnode name="clu2"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="192.168.1.54" monitor_link="1"/>
                        <script file="/etc/rc.d/init.d/httpd" name="apache"/>
                        <script file="/etc/rc.d/init.d/vsftpd" name="ftp"/>
                        <script file="/etc/rc.d/init.d/mysqld" name="mysql"/>
                </resources>
                <service autostart="1" domain="xx-cluster" name="xxcluster">
                        <ip ref="192.168.1.54"/>
                        <script ref="apache"/>
                        <script ref="ftp"/>
                        <script ref="mysql"/>
                </service>
        </rm>
</cluster>




-- 
Omer Faruk Sen
http://www.faruk.net



From gforte at leopard.us.udel.edu  Sun Feb  5 18:03:15 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Sun, 05 Feb 2006 13:03:15 -0500
Subject: [Linux-cluster] <crit> #12: RG xxcluster failed to stop;
	intervention required
In-Reply-To: <64526.85.103.165.73.1139161287.squirrel@85.103.165.73>
References: <64526.85.103.165.73.1139161287.squirrel@85.103.165.73>
Message-ID: <43E63DE3.3080908@leopard.us.udel.edu>

yeah, I just had this same problem.  the stock apache init script
returns 1 if you (or the RGmanager) passes it 'stop' when apache's not
running.  the rgmanager attempts to stop the service before starting it
(to be sure it's not running), gets this "error" result, and gives up.
So ... you either have to start apache manually before trying to
reenable it in the cluster, or fix the init script so that it returns 0
(ideally, it should return 0 if you try to stop and it's not running,
but non-zero if you try to stop and it IS running and it fails to stop
for some reason ... but I didn't put that much effort into it ;-)

-g

Omer Faruk Sen wrote:
> Hi,
> 
> I have done a simple test but redhat-cluster can't survive it. I have a 2
> node cluster I have edited httpd.conf so it can't start (adding a few
> characters to httpd.conf) and on node2 I have made a
> 
> /usr/sbin/clusvcadm -r cluster -m clu2
> 
> it move resources to clu2 perfectly but when I corrected httpd.conf on
> clu1 and issue
> 
> /usr/sbin/clusvcadm -r ggcluster -m clu1
> 
> on node2 I get following errors:
> 
> Feb  5 16:25:19 clu1 clurgmgrd[2256]: <notice> stop on script "apache"
> returned 1 (generic error)
> Feb  5 16:25:19 clu1 clurgmgrd[2256]: <crit> #12: RG xxcluster failed to
> stop; intervention required
> Feb  5 16:25:19 clu1 clurgmgrd[2256]: <notice> Service xxcluster is failed
> 
> It shows that httpd service can't get started but when I manually start
> apache it starts without problem ..
> 
> Can someone tell me what I am missing. It is a simple test and redhat
> cluster didn't pass it.
> 
> I use  2 dl140 using fence_ipmilan. I can manually stop start and reboot
> server using fence_ipmilan command. And here is my cluster.conf:
> 
> <?xml version="1.0"?>
> <cluster config_version="5" name="xx">
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="clu1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="clu1-ilo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="clu2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="clu2-ilo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_ipmilan" ipaddr="192.168.1.98"
> login="admin" name="clu1-ilo" passwd="admin"/>
>                 <fencedevice agent="fence_ipmilan" ipaddr="192.168.1.99"
> login="admin" name="clu2-ilo" passwd="admin"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="xx-cluster" ordered="0"
> restricted="1">
>                                 <failoverdomainnode name="clu1"
> priority="1"/>
>                                 <failoverdomainnode name="clu2"
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <ip address="192.168.1.54" monitor_link="1"/>
>                         <script file="/etc/rc.d/init.d/httpd" name="apache"/>
>                         <script file="/etc/rc.d/init.d/vsftpd" name="ftp"/>
>                         <script file="/etc/rc.d/init.d/mysqld" name="mysql"/>
>                 </resources>
>                 <service autostart="1" domain="xx-cluster" name="xxcluster">
>                         <ip ref="192.168.1.54"/>
>                         <script ref="apache"/>
>                         <script ref="ftp"/>
>                         <script ref="mysql"/>
>                 </service>
>         </rm>
> </cluster>
> 
> 
> 
> 



From pcaulfie at redhat.com  Mon Feb  6 08:42:12 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 06 Feb 2006 08:42:12 +0000
Subject: [Linux-cluster] two nodes -> tree nodes
In-Reply-To: <62448.85.103.165.73.1139084482.squirrel@85.103.165.73>
References: <43E20FF9.4060406@linworx.cz> <43E23012.5010405@redhat.com>
	<62448.85.103.165.73.1139084482.squirrel@85.103.165.73>
Message-ID: <43E70BE4.30305@redhat.com>

Omer Faruk Sen wrote:
> Then is it possible to create a cluster with 2 modes without setting
> two_node="1" ? Thus it will enable us not to reboot our servers when
> adding the third node.
> 

Yes, BUT if you lose one node then the other will stop because of lack of
quorum. The purpose of the two_node mode is to allow a single node to carry on
 working on its own - so it's not useful.

So that's a "no" really :)
-- 

patrick



From brodriguezb at fujitsu.es  Mon Feb  6 09:20:12 2006
From: brodriguezb at fujitsu.es (=?UTF-8?B?QmFydG9sb23DqSBSb2Ryw61ndWV6?=)
Date: Mon, 06 Feb 2006 10:20:12 +0100
Subject: [Linux-cluster] gfs_tool lockdump
Message-ID: <43E714CC.7080606@fujitsu.es>

    Hello, anyone know some manual to interpret the out of "gfs_tool 
lockdump /mount/point" ?
   
    Regards.

-- 
________________________________________

Bartolom? Rodr?guez Bordallo
Departamento de Explotaci?n de Servicios

FUJITSU ESPA?A SERVICES, S.A.U.
Camino Cerro de los Gamos, 1
28224 Pozuelo de Alarc?n, Madrid
Tel.: 902 11 40 10
Mail: brodriguezb at fujitsu.es

________________________________________


La informaci?n contenida en este e-mail es confidencial y va dirigida ?nicamente al receptor que aparece como destinatario. Si ha recibido este e-mail por error, por favor, notif?quenoslo inmediatamente y b?rrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use para ning?n prop?sito, ni revele sus contenidos a ninguna persona ni lo almacene ni copie esta informaci?n en ning?n medio.



From depeecmr at yahoo.com  Mon Feb  6 21:14:30 2006
From: depeecmr at yahoo.com (Daniel EPEE LEA)
Date: Mon, 6 Feb 2006 13:14:30 -0800 (PST)
Subject: [Linux-cluster] help help,
	nodes restarting each other through HP iLO fencing and vice versa
In-Reply-To: <43E714CC.7080606@fujitsu.es>
Message-ID: <20060206211430.45553.qmail@web30212.mail.mud.yahoo.com>

Hi Everyone,

Please find appended my cluster.conf file. I have 2
nodes (DL 580G2 + iLO + RHEL ES v4 + RHCS + GFS) + SAN

I am seting up a cluster, but when fenced stars, on
node1 (rs1.domain.com) first, node 2 (rs2.domain.com)
restart, and vice versa.  can someone help me fix
this, please


Much Regards,

Daniel


<?xml version="1.0"?>
<cluster config_version="10" name="cpanel_cluster">
        <fence_daemon clean_start="0"
post_fail_delay="0" post_join_delay="5"/>
        <clusternodes>
                <clusternode name="rs1.domain.com"
votes="1">
                        <fence>
                                <method name="1">
                                        <device
name="ilors1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="rs2.domain.com"
votes="1">
                        <fence>
                                <method name="1">
                                        <device
name="ilors2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ilo"
hostname="192.168.10.2" login="Ad
ministrator" name="ilors1" passwd="letmein"/>
                <fencedevice agent="fence_ilo"
hostname="192.168.10.3" login="Ad
ministrator" name="ilors2" passwd="letmein"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="cpanel"
ordered="1" restricted="1"
>
                                <failoverdomainnode
name="rs1.domain.com" priori
ty="1"/>
                                <failoverdomainnode
name="rs2.domain.com" priori
ty="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
        </rm>
</cluster>


-----------------------------------------------------------------------------
T O    G O D       B E        T H E         G L O R Y     :)
------------------------------------------------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From teigland at redhat.com  Mon Feb  6 21:26:48 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 6 Feb 2006 15:26:48 -0600
Subject: [Linux-cluster] help help,
	nodes restarting each other through HP iLO fencing and vice versa
In-Reply-To: <20060206211430.45553.qmail@web30212.mail.mud.yahoo.com>
References: <43E714CC.7080606@fujitsu.es>
	<20060206211430.45553.qmail@web30212.mail.mud.yahoo.com>
Message-ID: <20060206212645.GB11780@redhat.com>

On Mon, Feb 06, 2006 at 01:14:30PM -0800, Daniel EPEE LEA wrote:
> Hi Everyone,
> 
> Please find appended my cluster.conf file. I have 2
> nodes (DL 580G2 + iLO + RHEL ES v4 + RHCS + GFS) + SAN
> 
> I am seting up a cluster, but when fenced stars, on
> node1 (rs1.domain.com) first, node 2 (rs2.domain.com)
> restart, and vice versa.  can someone help me fix
> this, please

Are the nodes fencing each other when they start up?  If so, the way to
avoid this is to make sure both nodes have joined the cluster before
either node runs 'fence_tool join' (which starts fenced).  Further
explanation in fenced man page.

Dave



From depeecmr at yahoo.com  Mon Feb  6 21:38:05 2006
From: depeecmr at yahoo.com (Daniel EPEE LEA)
Date: Mon, 6 Feb 2006 13:38:05 -0800 (PST)
Subject: [Linux-cluster] help help,
	nodes restarting each other through HP iLO fencing and vice versa
In-Reply-To: <20060206212645.GB11780@redhat.com>
Message-ID: <20060206213805.70938.qmail@web30214.mail.mud.yahoo.com>

Dave,
It's at startup. I got the 
Both node hang at "Service fenced start".

So I restarted node1. Node2 proceeded correctly once
node 1 was restarting. Then when fenced started in
node1, node 2 was fenced, and rebooted. :(


let me go through what you adviced.

Much regards

Daniel

--- David Teigland <teigland at redhat.com> wrote:

> On Mon, Feb 06, 2006 at 01:14:30PM -0800, Daniel
> EPEE LEA wrote:
> > Hi Everyone,
> > 
> > Please find appended my cluster.conf file. I have
> 2
> > nodes (DL 580G2 + iLO + RHEL ES v4 + RHCS + GFS) +
> SAN
> > 
> > I am seting up a cluster, but when fenced stars,
> on
> > node1 (rs1.domain.com) first, node 2
> (rs2.domain.com)
> > restart, and vice versa.  can someone help me fix
> > this, please
> 
> Are the nodes fencing each other when they start up?
>  If so, the way to
> avoid this is to make sure both nodes have joined
> the cluster before
> either node runs 'fence_tool join' (which starts
> fenced).  Further
> explanation in fenced man page.
> 
> Dave
> 
> 


-----------------------------------------------------------------------------
T O    G O D       B E        T H E         G L O R Y     :)
------------------------------------------------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From depeecmr at yahoo.com  Mon Feb  6 22:25:03 2006
From: depeecmr at yahoo.com (Daniel EPEE LEA)
Date: Mon, 6 Feb 2006 14:25:03 -0800 (PST)
Subject: [Linux-cluster] help help,
	nodes restarting each other through HP iLO fencing and vice versa
In-Reply-To: <20060206212645.GB11780@redhat.com>
Message-ID: <20060206222503.24381.qmail@web30209.mail.mud.yahoo.com>

Dave,

I have set clean start ti "1" and both node started
correctly.

Now, at "clustat" I just see one node in the cluster.
How do I go from here ?

Shouldn't both nodes show up as members of the cluster
?


Thanks for your help.


----------
[root at rs2 ~]# clustat
Member Status: Quorate, Group Member

  Member Name                              State     
ID
  ------ ----                              -----     
--
  rs2.domain.com                           Online    
0x0000000000000001

------------
[root at rs2 ~]# service cman status
Protocol version: 5.0.1
Config version: 13
Cluster name: cpanel_cluster
Cluster ID: 47540
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 4
Node name: rs2.domain.com
Node addresses: A.B.C.42




Daniel

--- David Teigland <teigland at redhat.com> wrote:

> On Mon, Feb 06, 2006 at 01:14:30PM -0800, Daniel
> EPEE LEA wrote:
> > Hi Everyone,
> > 
> > Please find appended my cluster.conf file. I have
> 2
> > nodes (DL 580G2 + iLO + RHEL ES v4 + RHCS + GFS) +
> SAN
> > 
> > I am seting up a cluster, but when fenced stars,
> on
> > node1 (rs1.domain.com) first, node 2
> (rs2.domain.com)
> > restart, and vice versa.  can someone help me fix
> > this, please
> 
> Are the nodes fencing each other when they start up?
>  If so, the way to
> avoid this is to make sure both nodes have joined
> the cluster before
> either node runs 'fence_tool join' (which starts
> fenced).  Further
> explanation in fenced man page.
> 
> Dave
> 
> 


-----------------------------------------------------------------------------
T O    G O D       B E        T H E         G L O R Y     :)
------------------------------------------------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From teigland at redhat.com  Mon Feb  6 22:33:18 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 6 Feb 2006 16:33:18 -0600
Subject: [Linux-cluster] help help,
	nodes restarting each other through HP iLO fencing and vice versa
In-Reply-To: <20060206222503.24381.qmail@web30209.mail.mud.yahoo.com>
References: <20060206212645.GB11780@redhat.com>
	<20060206222503.24381.qmail@web30209.mail.mud.yahoo.com>
Message-ID: <20060206223318.GC11780@redhat.com>

On Mon, Feb 06, 2006 at 02:25:03PM -0800, Daniel EPEE LEA wrote:
> Dave,
> 
> I have set clean start ti "1" and both node started
> correctly.

You shouldn't use the clean start option, it's only for very special
cases.

> Now, at "clustat" I just see one node in the cluster.
> How do I go from here ?
> 
> Shouldn't both nodes show up as members of the cluster ?

Apparently your nodes can't see each other, use 'cman_tool nodes',
turn off firewalls.  This is your problem, once fixed then other
things like fencing should be ok.

Dave



From depeecmr at yahoo.com  Mon Feb  6 22:50:16 2006
From: depeecmr at yahoo.com (Daniel EPEE LEA)
Date: Mon, 6 Feb 2006 14:50:16 -0800 (PST)
Subject: [Linux-cluster] help help,
	nodes restarting each other through HP iLO fencing and vice versa
In-Reply-To: <20060206223318.GC11780@redhat.com>
Message-ID: <20060206225016.12059.qmail@web30215.mail.mud.yahoo.com>

Dave,

You are the man! :) You were 100% right. I didn't need
clean start and I just desactivated iptables and
everything went just fine. now I can see both nodes.

Thanks a lot for your help Dave. I trully appreciate
that.

Much regards,

Daniel

----------------------------
[root at rs2 ~]# clustat
Member Status: Quorate, Group Member

  Member Name                              State     
ID
  ------ ----                              -----     
--
  rs1.domain.com                           Online    
0x0000000000000002
  rs2.domain.com                           Online    
0x0000000000000001
------------
[root at rs2 ~]# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    1   M   rs2.domain.com
   2    1    1   M   rs1.domain.com


--- David Teigland <teigland at redhat.com> wrote:

> On Mon, Feb 06, 2006 at 02:25:03PM -0800, Daniel
> EPEE LEA wrote:
> > Dave,
> > 
> > I have set clean start ti "1" and both node
> started
> > correctly.
> 
> You shouldn't use the clean start option, it's only
> for very special
> cases.
> 
> > Now, at "clustat" I just see one node in the
> cluster.
> > How do I go from here ?
> > 
> > Shouldn't both nodes show up as members of the
> cluster ?
> 
> Apparently your nodes can't see each other, use
> 'cman_tool nodes',
> turn off firewalls.  This is your problem, once
> fixed then other
> things like fencing should be ok.
> 
> Dave
> 
> 


-----------------------------------------------------------------------------
T O    G O D       B E        T H E         G L O R Y     :)
------------------------------------------------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From Anthony.Assi at irisa.fr  Tue Feb  7 12:37:22 2006
From: Anthony.Assi at irisa.fr (Anthony Assi)
Date: Tue, 07 Feb 2006 13:37:22 +0100
Subject: [Linux-cluster] /var/log/secure
Message-ID: <43E89482.3060600@irisa.fr>

Hi,

Can anyone tell me what does the following line in the /var/log/secure 
file means.


[genoServer]$ tail -f /var/log/secure
Feb  7 13:30:02 genoServer sshd[7750]: Did not receive identification 
string from ::ffff:127.0.0.1



From anderson at centtech.com  Tue Feb  7 12:55:56 2006
From: anderson at centtech.com (Eric Anderson)
Date: Tue, 07 Feb 2006 06:55:56 -0600
Subject: [Linux-cluster] /var/log/secure
In-Reply-To: <43E89482.3060600@irisa.fr>
References: <43E89482.3060600@irisa.fr>
Message-ID: <43E898DC.9080706@centtech.com>

Anthony Assi wrote:
> Hi,
>
> Can anyone tell me what does the following line in the /var/log/secure 
> file means.
>
>
> [genoServer]$ tail -f /var/log/secure
> Feb  7 13:30:02 genoServer sshd[7750]: Did not receive identification 
> string from ::ffff:127.0.0.1


I'm sure google could tell you all kinds of answers here, however, I see 
this on machines with a network monitoring tool installed that is 
checking for sshd to be running.  If you connect to port 22 via telnet 
(telnet localhost 22), and then disconnect, it will probably show you 
that message.

Eric





-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------



From mbrookov at mines.edu  Tue Feb  7 22:38:51 2006
From: mbrookov at mines.edu (Matt Brookover)
Date: Tue, 07 Feb 2006 15:38:51 -0700
Subject: [Linux-cluster] gfs assertion failed on line 242 of file linux_dio.c
Message-ID: <1139351931.4314.20.camel@merlin.Mines.EDU>

One GFS node had a kernel panic.  

I copied this from the console:

kernel panic: GFS: Assertion failed on line 242 of file linux_dio.c
gfs:assertion: "buffer_uptodate (bh)"
gfs:time=1139336367
gfs: fsid=CSM_ACN:u_af.3

The first part of the stack trace was logged by syslog:
Feb  7 11:19:27 inception kernel: Panicking because of write error
Feb  7 11:19:27 inception kernel: f41bfef8 f8f4cb72 00000001 c0384e98
00000000 00000246 00000021 f284bdec
Feb  7 11:19:27 inception kernel:        c01297e3 0000000a 00000400
f8f6e169 cccb6da8 cccb6d60 e35a6b8c f284bdec
Feb  7 11:19:27 inception kernel:        f8f308e5 f8f6c07e f8f6bfba
000000f2 00000013 f98ae000 f284bdec f284bd80
Feb  7 11:19:27 inception kernel: Call Trace:   [<f8f4cb72>] gfs_asserti
[gfs] 0x32 (0xf41bfefc)

The rest of the stack trace was was displayed on the console and I
copied it down.  If it would be of use, I can type it in.  

Any ideas what would cause this?

There are 6 nodes in the cluster running Enterprise 3 update 6 and GFS
6.0.2.27.

thanks

Matt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060207/41261735/attachment.htm>

From gstaltari at arnet.net.ar  Wed Feb  8 15:53:51 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Wed, 08 Feb 2006 12:53:51 -0300
Subject: [Linux-cluster] cluster name length bug
Message-ID: <43EA140F.4070306@arnet.net.ar>


Hi, while I was trying to setup a new cluster I've found a bug in the 
valid name length of a cluster.
The MAX_CLUSTER_NAME_LEN is 16, but if I put a cluster name of 16 chars 
when a node tries to connect to the cluster the node is refused to 
connect  and in 'dmesg' you can see this:
CMAN: attempt to join with cluster name 'qmail-be-cluster' refused
CMAN: attempt to join with cluster name 'qmail-be-cluster' refused
CMAN: attempt to join with cluster name 'qmail-be-cluster' refused
CMAN: attempt to join with cluster name 'qmail-be-cluster' refused
Searching through the code I found that It was here where the problem 
exists:
-----------------------------------------------------------------------------
                /* Check the cluster name matches */
                if (strcmp(cluster_name, joinmsg->clustername)) {
                        printk(KERN_WARNING CMAN_NAME
                               ": attempt to join with cluster name '%s' 
refused\n",
                               joinmsg->clustername);
                        return -1;
                }
----------------------------------------------------------------------------

The name of the cluster was the same in all nodes: qmail-be-cluster.
It's easy to reproduce the problem. Just create a cluster with a name of 
16 chars and then It's impossible to connect to the cluster.

This is my conf file in all nodes:

<?xml version="1.0"?>
<cluster config_version="13" name="qmail-be-cluster">
        <fence_daemon clean_start="1" post_fail_delay="0" 
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="qmail-be-01" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="manopla" 
nodename="qmail-be-01"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="qmail-be-04" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="manopla" 
nodename="qmail-be-04"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="qmail-be-05" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="manopla" 
nodename="qmail-be-05"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="manopla"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>


I'm using Fedora Core 4 with this rpm's :

kernel-smp-2.6.14-1.1653_FC4
GFS-6.1.0-3
GFS-kernel-smp-2.6.11.8-20050601.152643.FC4.20
cman-kernel-smp-2.6.11.5-20050601.152643.FC4.18
cman-1.0.0-1
ccs-1.0.0-1
dlm-kernel-smp-2.6.11.5-20050601.152643.FC4.17
dlm-1.0.0-3
fence-1.32.1-1
rgmanager-1.9.34-5
lvm2-2.01.08-2.1
lvm2-cluster-2.01.09-3.0

We need to put this on bugzilla, but I don't know how to do it. :(

Thanks
German Staltari



From pcaulfie at redhat.com  Wed Feb  8 16:29:49 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 08 Feb 2006 16:29:49 +0000
Subject: [Linux-cluster] cluster name length bug
In-Reply-To: <43EA140F.4070306@arnet.net.ar>
References: <43EA140F.4070306@arnet.net.ar>
Message-ID: <43EA1C7D.8010801@redhat.com>

German Staltari wrote:
> 
> Hi, while I was trying to setup a new cluster I've found a bug in the
> valid name length of a cluster.
> The MAX_CLUSTER_NAME_LEN is 16, but if I put a cluster name of 16 chars
> when a node tries to connect to the cluster the node is refused to
> connect  and in 'dmesg' you can see this:
> CMAN: attempt to join with cluster name 'qmail-be-cluster' refused
> CMAN: attempt to join with cluster name 'qmail-be-cluster' refused
> CMAN: attempt to join with cluster name 'qmail-be-cluster' refused
> CMAN: attempt to join with cluster name 'qmail-be-cluster' refused
> Searching through the code I found that It was here where the problem
> exists:
> -----------------------------------------------------------------------------
> 
>                /* Check the cluster name matches */
>                if (strcmp(cluster_name, joinmsg->clustername)) {
>                        printk(KERN_WARNING CMAN_NAME
>                               ": attempt to join with cluster name '%s'
> refused\n",
>                               joinmsg->clustername);
>                        return -1;
>                }
> ----------------------------------------------------------------------------
> 
> 
> The name of the cluster was the same in all nodes: qmail-be-cluster.
> It's easy to reproduce the problem. Just create a cluster with a name of
> 16 chars and then It's impossible to connect to the cluster.
> 
> This is my conf file in all nodes:
> 
> <?xml version="1.0"?>
> <cluster config_version="13" name="qmail-be-cluster">
>        <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="3"/>
>        <clusternodes>
>                <clusternode name="qmail-be-01" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="manopla"
> nodename="qmail-be-01"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="qmail-be-04" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="manopla"
> nodename="qmail-be-04"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="qmail-be-05" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device name="manopla"
> nodename="qmail-be-05"/>
>                                </method>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>        <cman/>
>        <fencedevices>
>                <fencedevice agent="fence_manual" name="manopla"/>
>        </fencedevices>
>        <rm>
>                <failoverdomains/>
>                <resources/>
>        </rm>
> </cluster>
> 
> 
> I'm using Fedora Core 4 with this rpm's :
> 
> kernel-smp-2.6.14-1.1653_FC4
> GFS-6.1.0-3
> GFS-kernel-smp-2.6.11.8-20050601.152643.FC4.20
> cman-kernel-smp-2.6.11.5-20050601.152643.FC4.18
> cman-1.0.0-1
> ccs-1.0.0-1
> dlm-kernel-smp-2.6.11.5-20050601.152643.FC4.17
> dlm-1.0.0-3
> fence-1.32.1-1
> rgmanager-1.9.34-5
> lvm2-2.01.08-2.1
> lvm2-cluster-2.01.09-3.0
> 
> We need to put this on bugzilla, but I don't know how to do it. :(
> 


we know. It's "fixed" in later packages - the name is restricted to 15 chars !
-- 

patrick



From jpenalbae at gmail.com  Wed Feb  8 23:53:12 2006
From: jpenalbae at gmail.com (=?ISO-8859-1?Q?Jaime_Pe=F1alba?=)
Date: Thu, 9 Feb 2006 00:53:12 +0100
Subject: [Linux-cluster] gfs: Unknown symbol
	blockdev_direct_IO_cluster_locking on RHEL4
Message-ID: <edaeffdd0602081553v4062c9abq@mail.gmail.com>

Hi,

Im getting this error when trying to load gfs module:

[root at apache2 ~]# modprobe gfs
FATAL: Error inserting gfs
(/lib/modules/2.6.9-5.ELsmp/kernel/fs/gfs/gfs.ko): Unknown symbol in
module, or unknown parameter (see dmesg)
gfs: Unknown symbol blockdev_direct_IO_cluster_locking

I have been looking for that function, but i cannot find it defined
anywere, Im using the RHEL4 version from cvs, and the latest redhat
kernel.

Can i safely change this?

return blockdev_direct_IO_cluster_locking(rw, iocb, inode,
                                  inode->i_sb->s_bdev, iov,
                                  offset, nr_segs, gb, NULL);

with "return 1" or something like? Im going to use always lock_dlm.

Anyway there is any other workaround?


Thanks,
Jaime.



From wcheng at redhat.com  Thu Feb  9 04:31:32 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Wed, 08 Feb 2006 23:31:32 -0500
Subject: [Linux-cluster] gfs: Unknown symbol
	blockdev_direct_IO_cluster_locking on RHEL4
In-Reply-To: <edaeffdd0602081553v4062c9abq@mail.gmail.com>
References: <edaeffdd0602081553v4062c9abq@mail.gmail.com>
Message-ID: <1139459492.29567.15.camel@localhost.localdomain>

On Thu, 2006-02-09 at 00:53 +0100, Jaime Pe?alba wrote:
> Hi,
> 
> Im getting this error when trying to load gfs module:
> 
> [root at apache2 ~]# modprobe gfs
> FATAL: Error inserting gfs
> (/lib/modules/2.6.9-5.ELsmp/kernel/fs/gfs/gfs.ko): Unknown symbol in
> module, or unknown parameter (see dmesg)
> gfs: Unknown symbol blockdev_direct_IO_cluster_locking

This is a patch we added to fix a direct IO deadlock issue. We've tried
hard not to have dependency on base kernel but simply couldn't do it in
this case. 

The change has to pair with Red Hat RHEL 4 U3 kernel (>=2.6.9-25) that
is currently in Beta testing. 

If you can build your own base kernel, the base kernel patch is placed
in Red Hat bugzilla 173912 comment 4. If you can't read that bugzilla,
let me know. 

> Can i safely change this?
> 
> return blockdev_direct_IO_cluster_locking(rw, iocb, inode,
>                                   inode->i_sb->s_bdev, iov,
>                                   offset, nr_segs, gb, NULL);
> 
> with "return 1" or something like? Im going to use always lock_dlm.

If you don't use Direct IO at all, this should be ok.

> 
> Anyway there is any other workaround?
> 
> 

Let us know if above can't work out for you.

-- Wendy



From f_schliefer at vcc.de  Thu Feb  9 09:50:24 2006
From: f_schliefer at vcc.de (Frank Schliefer)
Date: Thu, 09 Feb 2006 10:50:24 +0100
Subject: [Linux-cluster] Node lag
In-Reply-To: <43E714CC.7080606@fujitsu.es>
References: <43E714CC.7080606@fujitsu.es>
Message-ID: <43EB1060.7000503@vcc.de>

Hi,

after setting up an four node cluster we have one node that is way 
slower than the other 3 nodes.

We using eg. tiotest for benchmarking the GFS.

Normal Node:
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write          40 MBs |    0.2 s | 227.426 MB/s |  36.4 %  | 384.4 % |
| Random Write   16 MBs |    0.1 s | 143.405 MB/s |  58.7 %  | 146.9 % |
| Read           40 MBs |    0.0 s | 2558.199 MB/s | 307.0 %  | 1228.0 % |
| Random Read    16 MBs |    0.0 s | 2685.169 MB/s | 550.0 %  | 1374.9 % |
`----------------------------------------------------------------------'


Slow Node:
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write          40 MBs |    1.4 s |  27.687 MB/s |   2.2 %  | 121.8 % |
| Random Write   16 MBs |    4.2 s |   3.695 MB/s |   0.0 %  |   7.9 % |
| Read           40 MBs |    0.0 s | 2228.288 MB/s |  89.1 %  | 1337.1 % |
| Random Read    16 MBs |    0.0 s | 2252.739 MB/s | 230.7 %  | 692.1 % |
`----------------------------------------------------------------------'

any hints why this could happen ??

Using kernel 2.6.15.2 (sorry no RH)

besides that the cluster runs very stable !

best regards

-- 
----

Frank Schliefer



From pcaulfie at redhat.com  Thu Feb  9 14:18:19 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 09 Feb 2006 14:18:19 +0000
Subject: [Linux-cluster] Node lag
In-Reply-To: <43EB1060.7000503@vcc.de>
References: <43E714CC.7080606@fujitsu.es> <43EB1060.7000503@vcc.de>
Message-ID: <43EB4F2B.9030603@redhat.com>

Frank Schliefer wrote:
> Hi,
> 
> after setting up an four node cluster we have one node that is way
> slower than the other 3 nodes.
> 
> We using eg. tiotest for benchmarking the GFS.
> 
> Normal Node:
> Tiotest results for 4 concurrent io threads:
> ,----------------------------------------------------------------------.
> | Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
> +-----------------------+----------+--------------+----------+---------+
> | Write          40 MBs |    0.2 s | 227.426 MB/s |  36.4 %  | 384.4 % |
> | Random Write   16 MBs |    0.1 s | 143.405 MB/s |  58.7 %  | 146.9 % |
> | Read           40 MBs |    0.0 s | 2558.199 MB/s | 307.0 %  | 1228.0 % |
> | Random Read    16 MBs |    0.0 s | 2685.169 MB/s | 550.0 %  | 1374.9 % |
> `----------------------------------------------------------------------'
> 
> 
> Slow Node:
> Tiotest results for 4 concurrent io threads:
> ,----------------------------------------------------------------------.
> | Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
> +-----------------------+----------+--------------+----------+---------+
> | Write          40 MBs |    1.4 s |  27.687 MB/s |   2.2 %  | 121.8 % |
> | Random Write   16 MBs |    4.2 s |   3.695 MB/s |   0.0 %  |   7.9 % |
> | Read           40 MBs |    0.0 s | 2228.288 MB/s |  89.1 %  | 1337.1 % |
> | Random Read    16 MBs |    0.0 s | 2252.739 MB/s | 230.7 %  | 692.1 % |
> `----------------------------------------------------------------------'
> 
> any hints why this could happen ??
> 
> Using kernel 2.6.15.2 (sorry no RH)

It would be helpful if you could give us more information about your
installation: disk topology, lock manager in use (and which nodes are
lockservers if using GULM) and whether it matters which nodes are started
first or not.

-- 

patrick



From cjk at techma.com  Thu Feb  9 14:41:24 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 9 Feb 2006 09:41:24 -0500
Subject: [Linux-cluster] Node lag
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E06@tmaemail.techma.com>

Also, I think it might be interesting to see what happens when you use data
sizes that
will overrun any cacheing being done. I've seen great performance using a
simple MSA1000
as long as there is a lot of cache available on the SAN itself. As soon as I
run tests with
data sets larger then the cache size, the performance falls to the floor.
Unless your over
loading the cache, you might not be getting a true metric of whats really
getting written 
to disk.

Maybe the slow node is getting hit by cache overhead from the SAN? 


Just a thought


Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
Sent: Thursday, February 09, 2006 9:18 AM
To: linux clustering
Subject: Re: [Linux-cluster] Node lag

Frank Schliefer wrote:
> Hi,
> 
> after setting up an four node cluster we have one node that is way 
> slower than the other 3 nodes.
> 
> We using eg. tiotest for benchmarking the GFS.
> 
> Normal Node:
> Tiotest results for 4 concurrent io threads:
> ,----------------------------------------------------------------------.
> | Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
> +-----------------------+----------+--------------+----------+---------+
> | Write          40 MBs |    0.2 s | 227.426 MB/s |  36.4 %  | 384.4 % |
> | Random Write   16 MBs |    0.1 s | 143.405 MB/s |  58.7 %  | 146.9 % |
> | Read           40 MBs |    0.0 s | 2558.199 MB/s | 307.0 %  | 1228.0 % |
> | Random Read    16 MBs |    0.0 s | 2685.169 MB/s | 550.0 %  | 1374.9 % |
> `----------------------------------------------------------------------'
> 
> 
> Slow Node:
> Tiotest results for 4 concurrent io threads:
> ,----------------------------------------------------------------------.
> | Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
> +-----------------------+----------+--------------+----------+---------+
> | Write          40 MBs |    1.4 s |  27.687 MB/s |   2.2 %  | 121.8 % |
> | Random Write   16 MBs |    4.2 s |   3.695 MB/s |   0.0 %  |   7.9 % |
> | Read           40 MBs |    0.0 s | 2228.288 MB/s |  89.1 %  | 1337.1 % |
> | Random Read    16 MBs |    0.0 s | 2252.739 MB/s | 230.7 %  | 692.1 % |
> `----------------------------------------------------------------------'
> 
> any hints why this could happen ??
> 
> Using kernel 2.6.15.2 (sorry no RH)

It would be helpful if you could give us more information about your
installation: disk topology, lock manager in use (and which nodes are
lockservers if using GULM) and whether it matters which nodes are started
first or not.

-- 

patrick

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From celso at webbertek.com.br  Thu Feb  9 15:18:57 2006
From: celso at webbertek.com.br (Celso K. Webber)
Date: Thu, 09 Feb 2006 13:18:57 -0200
Subject: [Linux-cluster] Fencing agents on DELL PowerEdge servers
Message-ID: <43EB5D61.8010008@webbertek.com.br>

Hello all,

Would someone please tell me if there is a way to use a Dell DRAC card 
as a fencing device under Red Hat Cluster Suite v3? I could successfully 
configure the DRACs and Cluster Suite v4 to use the Dell server's DRACs 
as fencing devices under Red Hat Cluster Suite v4, but I could not find 
a way to do that under CS v3 since the fencing agents are in binary 
format, not in shell script format as they are in CS v4.

Alternatively, does anyone know if it is possible to use Dell's 
"ipmitool" and "ipmish" commands over the server's BMC IP so that I 
could build a fencing agent script for Cluster Suite v4?

Thanks for your advice.

Celso.



From jon at csail.mit.edu  Thu Feb  9 15:52:34 2006
From: jon at csail.mit.edu (Jonathan D. Proulx)
Date: Thu, 9 Feb 2006 10:52:34 -0500
Subject: [Linux-cluster] Where's the documentation?
Message-ID: <20060209155234.GB10253@csail.mit.edu>


I have a piel of dumb questions that are best answered with RTFM, but
the only FM I seem to have is a couple of small README files in the
source diurectory.

[jon at borg1 src]$ ls cluster-1.01.00/doc/
min-gfs.txt  usage.txt

This was enough for me to get it working but I still have a
particularly poor understanding of fencing and shut down procedure (I
can get teh system to start cold, but if I bring a node down it
hangs.  seems to still be trying to leave the cluster after the
network goes down, bu tI have the clust shutdown scripts way before
the network shutdown scripts).

The only reference I can find in the archives points to the same two
documents (which are marvels of brevity, and as I said did get me
working).  Is there anything a bit more in depth?

Thanks,
-Jon



From mwill at penguincomputing.com  Thu Feb  9 17:08:50 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Thu, 9 Feb 2006 09:08:50 -0800
Subject: [Linux-cluster] Node lag - GFS performance numbers of 4 node
	cluster with tiotest
Message-ID: <9D61626573F86843908C9AE8BDB4390555EAE9@jellyfish.highlyscyld.com>

1. How much ram has each node
2. do they have fibre attached storage, and which, how many disks,
raid-controller cache size, or is it local disks, and again, what raid
controller and how many disks are you using?
3. when running tiotest, does it does it depend from where you run it on
what performance you see? Do those four threads run one on each node or
is it one node running four threads?

My first reaction to the numbers is:

2500MB/s sounds like it's not coming from disk but from RAM.

Michael

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
Sent: Thursday, February 09, 2006 6:18 AM
To: linux clustering
Subject: Re: [Linux-cluster] Node lag

Frank Schliefer wrote:
> Hi,
> 
> after setting up an four node cluster we have one node that is way 
> slower than the other 3 nodes.
> 
> We using eg. tiotest for benchmarking the GFS.
> 
> Normal Node:
> Tiotest results for 4 concurrent io threads:
>
,----------------------------------------------------------------------.
> | Item                  | Time     | Rate         | Usr CPU  | Sys CPU
|
>
+-----------------------+----------+--------------+----------+---------+
> | Write          40 MBs |    0.2 s | 227.426 MB/s |  36.4 %  | 384.4 %
|
> | Random Write   16 MBs |    0.1 s | 143.405 MB/s |  58.7 %  | 146.9 %
|
> | Read           40 MBs |    0.0 s | 2558.199 MB/s | 307.0 %  | 1228.0
% |
> | Random Read    16 MBs |    0.0 s | 2685.169 MB/s | 550.0 %  | 1374.9
% |
>
`----------------------------------------------------------------------'
> 
> 
> Slow Node:
> Tiotest results for 4 concurrent io threads:
>
,----------------------------------------------------------------------.
> | Item                  | Time     | Rate         | Usr CPU  | Sys CPU
|
>
+-----------------------+----------+--------------+----------+---------+
> | Write          40 MBs |    1.4 s |  27.687 MB/s |   2.2 %  | 121.8 %
|
> | Random Write   16 MBs |    4.2 s |   3.695 MB/s |   0.0 %  |   7.9 %
|
> | Read           40 MBs |    0.0 s | 2228.288 MB/s |  89.1 %  | 1337.1
% |
> | Random Read    16 MBs |    0.0 s | 2252.739 MB/s | 230.7 %  | 692.1
% |
>
`----------------------------------------------------------------------'
> 
> any hints why this could happen ??
> 
> Using kernel 2.6.15.2 (sorry no RH)

It would be helpful if you could give us more information about your
installation: disk topology, lock manager in use (and which nodes are
lockservers if using GULM) and whether it matters which nodes are
started first or not.

-- 

patrick

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From thaidn at gmail.com  Thu Feb  9 17:17:22 2006
From: thaidn at gmail.com (Thai Duong)
Date: Fri, 10 Feb 2006 00:17:22 +0700
Subject: [Linux-cluster] Kernel panic: GFS: Assertion failed on line 550 of
	file rgrp.c
Message-ID: <d4e2d9970602090917o543bb2b7j87add92ea82df0b8@mail.gmail.com>

Hi all,

My story is a little bit long so your patience is highly appreciated. I have
a 4-node Oracle9iR2 RAC using GFS 6.0 as the cluster file system. Each node
is running RHAS3 Update 6 for Itanium. We have 2 main pools: oracle_u01 to
store the $ORACLE_HOME, and oracle_u02 to store the datafile, controlfile,
redolog...Many thanks to the stabilization of GFS, this production cluster
have been running pretty well. Two days ago, the manager wants to have a
backup cluster with the same configuration except it consits of only one
node. Due to the lack of 64-bit servers, we got a node from the production
cluster out to make it become the 1-node backup cluster. We make new LUNs in
our EMC CX500 SAN, and clone the production's oracle_u01 and oracle_u02 to
them. We are able to mount these pools as GFS file system onto the backup
cluster's node. So far so good until we try to write to oracle_u02. Kernel
panic: GFS: Assertion failed on line 550 of file rgrp.c. It's pretty weird
cauze writing to oracle_u01 just works. They are just the same pool device
containing GFS file system, arent they? We even can not run df against u02!
We can mount and explore it using commands like cd, ls, dir...but not df!

There's no other message in /var/log/message, just the above error. If you
need any other information, please drop me a line! Please help.

Regards,

--Thai Duong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060210/2f00a27c/attachment.htm>

From lhh at redhat.com  Thu Feb  9 17:27:49 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 09 Feb 2006 12:27:49 -0500
Subject: [Linux-cluster] Fencing agents on DELL PowerEdge servers
In-Reply-To: <43EB5D61.8010008@webbertek.com.br>
References: <43EB5D61.8010008@webbertek.com.br>
Message-ID: <1139506069.19978.0.camel@ayanami.boston.redhat.com>

On Thu, 2006-02-09 at 13:18 -0200, Celso K. Webber wrote:
> Hello all,
> 
> Would someone please tell me if there is a way to use a Dell DRAC card 
> as a fencing device under Red Hat Cluster Suite v3? I could successfully 
> configure the DRACs and Cluster Suite v4 to use the Dell server's DRACs 
> as fencing devices under Red Hat Cluster Suite v4, but I could not find 
> a way to do that under CS v3 since the fencing agents are in binary 
> format, not in shell script format as they are in CS v4.

DRAC would be tough

> 
> Alternatively, does anyone know if it is possible to use Dell's 
> "ipmitool" and "ipmish" commands over the server's BMC IP so that I 
> could build a fencing agent script for Cluster Suite v4?

fence_ipmilan uses ipmitool, and the same code works on RHCS3 and RHCS4
(different compile time option)

-- Lon



From kanderso at redhat.com  Thu Feb  9 18:35:00 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Thu, 09 Feb 2006 12:35:00 -0600
Subject: [Linux-cluster] Kernel panic: GFS: Assertion failed on line
	550 of file rgrp.c
In-Reply-To: <d4e2d9970602090917o543bb2b7j87add92ea82df0b8@mail.gmail.com>
References: <d4e2d9970602090917o543bb2b7j87add92ea82df0b8@mail.gmail.com>
Message-ID: <1139510101.3034.62.camel@dhcp80-219.msp.redhat.com>

On Fri, 2006-02-10 at 00:17 +0700, Thai Duong wrote:
> Hi all,
> 
> My story is a little bit long so your patience is highly appreciated.
> I have a 4-node Oracle9iR2 RAC using GFS 6.0 as the cluster file
> system. Each node is running RHAS3 Update 6 for Itanium. We have 2
> main pools: oracle_u01 to store the $ORACLE_HOME, and oracle_u02 to
> store the datafile, controlfile, redolog...Many thanks to the
> stabilization of GFS, this production cluster have been running pretty
> well. Two days ago, the manager wants to have a backup cluster with
> the same configuration except it consits of only one node. Due to the
> lack of 64-bit servers, we got a node from the production cluster out
> to make it become the 1-node backup cluster. We make new LUNs in our
> EMC CX500 SAN, and clone the production's oracle_u01 and oracle_u02 to
> them. We are able to mount these pools as GFS file system onto the
> backup cluster's node. So far so good until we try to write to
> oracle_u02. Kernel panic: GFS: Assertion failed on line 550 of file
> rgrp.c. It's pretty weird cauze writing to oracle_u01 just works. They
> are just the same pool device containing GFS file system, arent they?
> We even can not run df against u02! We can mount and explore it using
> commands like cd, ls, dir...but not df! 

Purely speculation, but it sounds like when you snapped the LUN, there
was filesystem metadata that was not consistent on the storage.  This
can happen if you did not umount the filesystem or freeze it before
doing the clone.  Have you run fsck?

Kevin




From bobby.m.dalton at nasa.gov  Thu Feb  9 17:49:58 2006
From: bobby.m.dalton at nasa.gov (Dalton, Maurice)
Date: Thu, 9 Feb 2006 11:49:58 -0600
Subject: [Linux-cluster] Problem with RHEL 4.0 NFS service
Message-ID: <EB190CD1E73E1146ACB7694746E205A801655CA8@hx1.ums.msfc.nasa.gov>

I am trying to create an RHEL4.0 NFS service. I have a 2 system cluster.
Cluster.conf file is as below

<?xml version="1.0"?>
<cluster config_version="19" name="dpims_cluster">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="pvta09-eth1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="pvta10-eth1" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="dpimstst" ordered="0"
restricted="1">
                                <failoverdomainnode name="pvta09-eth1"
priority="1"/>
                                <failoverdomainnode name="pvta10-eth1"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <fs device="/dev/sdb1" force_unmount="1"
fstype="ext3" mountpoint="/disk1" name="dpims" options="rw"/>
                        <ip address="198.1xx.xxx.xxx" monitor_link="1"/>
                        <netfs export="/disk1" force_unmount="1"
fstype="nfs4" host="" mountpoint="/disk1" name="dpimstst" options="rw"/>
                        <nfsexport name="disk1"/>
                        <nfsclient name="export" options="rw"
target="sysmgr02"/>
                </resources>
                <service autostart="1" domain="dpimstst" name="dpims">
                        <ip ref="198.1xx.xxx.xxx"/>
                        <fs ref="dpims">
                                <nfsexport ref="disk1"/>
                                <nfsclient ref="export"/>
                        </fs>
                </service>
        </rm>
</cluster>



Rgmanager debug log.

Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <notice> Resource Group Manager
Starting
Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <info> Loading Service Data
Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <info> Initializing Services
Feb  9 17:14:59 pvt10 clurgmgrd: [12877]: <debug> 198.1xx.xxx.xxx is not
configured
Feb  9 17:14:59 pvt10 clurgmgrd: [12877]: <err> No export path
specified.
Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <notice> stop on nfsclient
"export" returned 2 (invalid argument(s))
Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <info> Services Initialized
Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <info> Logged in SG
"usrm::manager"
Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <info> Magma Event: Membership
Change
Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <info> State change: Local UP
Feb  9 17:14:59 pvt10 clurgmgrd[12877]: <err> #33: Unable to obtain
cluster lock: Operation not permitted


What am I doing wrong ????


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060209/615ea817/attachment.htm>

From agk at redhat.com  Thu Feb  9 20:14:29 2006
From: agk at redhat.com (Alasdair G Kergon)
Date: Thu, 9 Feb 2006 20:14:29 +0000
Subject: [Linux-cluster] Mailing list administrivia
Message-ID: <20060209201429.GQ12169@agk.surrey.redhat.com>

I have just caught up with the backlog of spam sent to these lists.

To reduce the risk of genuine messages you post going astray:

  Please send your messages from an email address that is subscribed 
  to the list.  Messages from unsubscribed addresses get caught up
  with the spam.  If they have a sensible subject line, they'll 
  stand out from the spam and then the email addresses are added to 
  a whitelist and allowed to post to the list directly in future.

  Please do not send large messages, HTML email or messages with 
  attachments.  Such messages will sometimes get through; other times 
  they won't.  Instead, put the data on the web somewhere and send the 
  URL to the list.

Alasdair
-- 
agk at redhat.com



From dillo+cluster at seas.upenn.edu  Thu Feb  9 20:19:55 2006
From: dillo+cluster at seas.upenn.edu (Bryan Cardillo)
Date: Thu, 9 Feb 2006 15:19:55 -0500
Subject: [Linux-cluster] Problem with RHEL 4.0 NFS service
In-Reply-To: <EB190CD1E73E1146ACB7694746E205A801655CA8@hx1.ums.msfc.nasa.gov>
References: <EB190CD1E73E1146ACB7694746E205A801655CA8@hx1.ums.msfc.nasa.gov>
Message-ID: <20060209201954.GB3305@rover.pcbi.upenn.edu>

On Thu, Feb 09, 2006 at 11:49:58AM -0600, Dalton, Maurice wrote:
>                 <service autostart="1" domain="dpimstst" name="dpims">
>                         <ip ref="198.1xx.xxx.xxx"/>
>                         <fs ref="dpims">
>                                 <nfsexport ref="disk1"/>
>                                 <nfsclient ref="export"/>
>                         </fs>
>                 </service>

        I believe this should be something like:

        <nfsexport ref="disk1">
            <nfsclient ref="export"/>
        </nfsexport>

        in other words, the client needs to be nested below the
        export.

        Cheers,
        Bryan Cardillo
        Penn Bioinformatics Core
        University of Pennsylvania



From jpenalbae at gmail.com  Thu Feb  9 21:48:06 2006
From: jpenalbae at gmail.com (=?ISO-8859-1?Q?Jaime_Pe=F1alba?=)
Date: Thu, 9 Feb 2006 22:48:06 +0100
Subject: [Linux-cluster] gfs: Unknown symbol
	blockdev_direct_IO_cluster_locking on RHEL4
In-Reply-To: <1139459492.29567.15.camel@localhost.localdomain>
References: <edaeffdd0602081553v4062c9abq@mail.gmail.com>
	<1139459492.29567.15.camel@localhost.localdomain>
Message-ID: <edaeffdd0602091348o4b7ba10du@mail.gmail.com>

Hi Wendy,

Thanks for the fast answer, i have applied your patch, recompiled and
its working fine now.

The patch was already included on the 2.6.9-22 source, but it isnt
applied by default.

Regards,
Jaime.


2006/2/9, Wendy Cheng <wcheng at redhat.com>:
> On Thu, 2006-02-09 at 00:53 +0100, Jaime Pe?alba wrote:
> > Hi,
> >
> > Im getting this error when trying to load gfs module:
> >
> > [root at apache2 ~]# modprobe gfs
> > FATAL: Error inserting gfs
> > (/lib/modules/2.6.9-5.ELsmp/kernel/fs/gfs/gfs.ko): Unknown symbol in
> > module, or unknown parameter (see dmesg)
> > gfs: Unknown symbol blockdev_direct_IO_cluster_locking
>
> This is a patch we added to fix a direct IO deadlock issue. We've tried
> hard not to have dependency on base kernel but simply couldn't do it in
> this case.
>
> The change has to pair with Red Hat RHEL 4 U3 kernel (>=2.6.9-25) that
> is currently in Beta testing.
>
> If you can build your own base kernel, the base kernel patch is placed
> in Red Hat bugzilla 173912 comment 4. If you can't read that bugzilla,
> let me know.
>
> > Can i safely change this?
> >
> > return blockdev_direct_IO_cluster_locking(rw, iocb, inode,
> >                                   inode->i_sb->s_bdev, iov,
> >                                   offset, nr_segs, gb, NULL);
> >
> > with "return 1" or something like? Im going to use always lock_dlm.
>
> If you don't use Direct IO at all, this should be ok.
>
> >
> > Anyway there is any other workaround?
> >
> >
>
> Let us know if above can't work out for you.
>
> -- Wendy
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From mwill at penguincomputing.com  Thu Feb  9 21:56:39 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Thu, 9 Feb 2006 13:56:39 -0800
Subject: [Linux-cluster] largest CIFS/Samba  export? largest GFS export?
Message-ID: <9D61626573F86843908C9AE8BDB4390555EB4D@jellyfish.highlyscyld.com>

What are current size limits for large storage pools. I was considering
aggragating 20 2TB luns on a fibre SAN with LVM2 into one 40TB volume,
create a filesystem on it with XFS and export it both with NFS v3 to
linux clients
as well as with samba to windows clients.
 
The linux clients are all 2.6 kernel 64bit, so they should not have an
issue with this,
but what about the 32bit windows clients running windows2003 and
windowsxp ? 
 
Do you have any experience exporting larger than 2TB filesystems to
windows clients through samba?
 
Does GFS have similar restrictions (linux clients only assumed) ?
 
Michael 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060209/737b83e5/attachment.htm>

From pegasus at nerv.eu.org  Thu Feb  9 22:17:18 2006
From: pegasus at nerv.eu.org (Jure =?ISO-8859-2?Q?Pe=E8ar?=)
Date: Thu, 9 Feb 2006 23:17:18 +0100
Subject: [Linux-cluster] disjunct services in cluster
Message-ID: <20060209231718.62e290d4.pegasus@nerv.eu.org>


Hi all,

I need to configure two services in my cluster that must run on different
nodes at all times. I don't want them to run on the same node.

I don't see a clear way how to do that, neither from documentation nor from
cluster config gui. Is it possible at all?


-- 

Jure Pe?ar
http://jure.pecar.org/



From mwill at penguincomputing.com  Thu Feb  9 22:53:16 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Thu, 9 Feb 2006 14:53:16 -0800
Subject: [Linux-cluster] largest CIFS/Samba  export? largest GFS export?
Message-ID: <9D61626573F86843908C9AE8BDB4390555EB66@jellyfish.highlyscyld.com>

http://storagemagazine.techtarget.com/magPrintFriendly/0,293813,sid35_gc
i1150389,00.html
<http://storagemagazine.techtarget.com/magPrintFriendly/0,293813,sid35_g
ci1150389,00.html>  suggests
that CIFS can export >320TB filesystems in the case of panasas, 16TB in
the case of GFS. None if this talks about 
limits on the 32bit windows client side though. Maybe there is none for
the filesystem but only for the file size? (4G?)
 
Michael

________________________________

From: linux-cluster-bounces at redha   t.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Michael Will
Sent: Thursday, February 09, 2006 1:57 PM
To: linux clustering
Subject: [Linux-cluster] largest CIFS/Samba export? largest GFS export?


What are current size limits for large storage pools. I was considering
aggragating 20 2TB luns on a fibre SAN with LVM2 into one 40TB volume,
create a filesystem on it with XFS and export it both with NFS v3 to
linux clients
as well as with samba to windows clients.
 
The linux clients are all 2.6 kernel 64bit, so they should not have an
issue with this,
but what about the 32bit windows clients running windows2003 and
windowsxp ? 
 
Do you have any experience exporting larger than 2TB filesystems to
windows clients through samba?
 
Does GFS have similar restrictions (linux clients only assumed) ?
 
Michael 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060209/7ac5af6f/attachment.htm>

From wcheng at redhat.com  Fri Feb 10 03:33:16 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Thu, 09 Feb 2006 22:33:16 -0500
Subject: [Linux-cluster] gfs: Unknown symbol
	blockdev_direct_IO_cluster_locking on RHEL4
In-Reply-To: <edaeffdd0602091348o4b7ba10du@mail.gmail.com>
References: <edaeffdd0602081553v4062c9abq@mail.gmail.com>
	<1139459492.29567.15.camel@localhost.localdomain>
	<edaeffdd0602091348o4b7ba10du@mail.gmail.com>
Message-ID: <1139542396.3399.6.camel@localhost.localdomain>

On Thu, 2006-02-09 at 22:48 +0100, Jaime Pe?alba wrote:
> Hi Wendy,
> 
> Thanks for the fast answer, i have applied your patch, recompiled and
> its working fine now.
> 
> The patch was already included on the 2.6.9-22 source, but it isnt
> applied by default.
> 

Glad this works out for you. 

One way to avoid the issue (from our end) is to change these locking
calls into macros and do different things based on base kernel version.
Will add this into todo list.

-- Wendy 



From danwest at comcast.net  Fri Feb 10 12:16:11 2006
From: danwest at comcast.net (danwest)
Date: Fri, 10 Feb 2006 07:16:11 -0500
Subject: [Linux-cluster] strange modprobe drops VIP & fails service
Message-ID: <1139573771.6972.9.camel@linux.site>

Has anyone every seen something like this?  A modprobe with either a
"module 0" or a blank "module " is always followed by the interface
loosing its VIP and thus initiating a service failure.  All non
essential system services were turned off.  Not sure what is
calling/causing the strange modprobe.  Logs are clean, no other events
leading up to this.  I am now wondering if it could be the cluster
itself?

Feb  9 13:44:32 node2 modprobe: modprobe: Can't locate module 0
Feb  9 13:44:32 node2 clusvcmgrd: [18150]: <err> service error: IP
address 147.107.188.81 missing
Feb  9 13:44:32 node2 clusvcmgrd: [18150]: <err> service error: 0: error
fetching interface information: Device not found
Feb  9 13:44:32 node2 clusvcmgrd: [18150]: <err> service error: Check
status failed on IP addresses for mservice1

Thanks,
 Dan



From michael.weitzel at uni-siegen.de  Fri Feb 10 13:54:21 2006
From: michael.weitzel at uni-siegen.de (Michael Weitzel)
Date: Fri, 10 Feb 2006 14:54:21 +0100
Subject: [Linux-cluster] GFS on iSCSI+LVM
Message-ID: <43EC9B0D.3020508@uni-siegen.de>

Hi,

after some time of successfull stress-testing I got an I/O-error after
some time and the following message in the system log:

GFS: fsid=Isabella:301bis304.3: fatal: invalid metadata block
GFS: fsid=Isabella:301bis304.3:   bh = 39254347 (type: exp=4, found=0)
GFS: fsid=Isabella:301bis304.3:   function = gfs_get_meta_buffer
GFS: fsid=Isabella:301bis304.3:   file =
/usr/src/packages/BUILD/cluster-cvs.STABLE/gfs-kernel/src/gfs/dio.c,
line = 1223
GFS: fsid=Isabella:301bis304.3:   time = 1139576449
GFS: fsid=Isabella:301bis304.3: about to withdraw from the cluster
GFS: fsid=Isabella:301bis304.3: waiting for outstanding I/O
GFS: fsid=Isabella:301bis304.3: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=Isabella:301bis304.3: withdrawn

gfs_fsck was able to correct this error. My "playground"-config is
probably rather unusual *g*
(I'm trying to use spare disks in cluster nodes to emulate a SAN):

 - 4 nodes of a cluster export spare disks via iSCSI (ietd)
 - 8 nodes (the 4 ietd-nodes included) import the iSCSI-targets
   with open-iscsi (v. 1.0-485) (nodes do not import their own exported
   disks)
 - the four local/imported disks form a LVM volume-group with a single
   large logical volume with GFS/lock_dlm/clvmd

Is clock-syncronization important? What caused this error?
Are there any experiences with iSCSI+Linux Software RAID (Level 1)+GFS?
-- 
Michael Weitzel



From E.H.Beekman at amc.nl  Fri Feb 10 14:22:50 2006
From: E.H.Beekman at amc.nl (Ewald Beekman)
Date: Fri, 10 Feb 2006 15:22:50 +0100
Subject: [Linux-cluster] anyone experience with an itanium cluster running
	oracle ?
Message-ID: <20060210142249.GU25789@core.amc.uva.nl>

We want to run a HA failover cluster on
HP Itaniums (DL740's) with Oracle 9i on top.
So RHEL4 (AS for itanium) and Red Hat Cluster with GFS
(the commercial supported stuff). 
Oracle RAC would not be installed, so all HA should
come from the OS cluster.

Anyone experience with such a setup? Would you recommend
against it? Wat are the caveats?

tia,
Ewald...

-- 
Ewald Beekman, Security Engineer, Academic Medical Center,
dept. ADB/ICT Computer & Network Services, The Netherlands
## Your mind-mint is:
Round Numbers are always false.
		-- Samuel Johnson



From gforte at leopard.us.udel.edu  Fri Feb 10 14:33:13 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Fri, 10 Feb 2006 09:33:13 -0500
Subject: [Linux-cluster] anyone experience with an itanium cluster running
	oracle ?
In-Reply-To: <20060210142249.GU25789@core.amc.uva.nl>
References: <20060210142249.GU25789@core.amc.uva.nl>
Message-ID: <43ECA429.5040205@leopard.us.udel.edu>

I am slowly building something similar, though with 10g and ES and not
Itanium (EM64T, though), and the outcome so far is a qualified "maybe".
Oracle is mostly OK with failing over from node to node, using just init
scripts I've crafted and the standard CS stuff - the database engine and
the listener work fine, as long as I temporarily change the hostname
during startup (and shutdown) of the services to the one it expects to
see (which is an alias associated with a floating IP address that always
gets assigned to the node running Oracle, anyway), but Enterprise
Manager is not so happy - parts of it work, parts of it don't,
specifically backup scheduling, patch grabbing/application, and
communication with the listener.  Other services that use Oracle have
been set up and are happy as clams, though.  I'm still waiting for some
help from Oracle on what the issue might be with EM.  I'll post further
info to the list as it comes ... if anyone else has been here before,
please do pipe up!

-g

Ewald Beekman wrote:
> We want to run a HA failover cluster on
> HP Itaniums (DL740's) with Oracle 9i on top.
> So RHEL4 (AS for itanium) and Red Hat Cluster with GFS
> (the commercial supported stuff). 
> Oracle RAC would not be installed, so all HA should
> come from the OS cluster.
> 
> Anyone experience with such a setup? Would you recommend
> against it? Wat are the caveats?
> 
> tia,
> Ewald...
> 



From kpodesta at redbrick.dcu.ie  Fri Feb 10 15:34:05 2006
From: kpodesta at redbrick.dcu.ie (Karl Podesta)
Date: Fri, 10 Feb 2006 15:34:05 +0000
Subject: [Linux-cluster] disjunct services in cluster
In-Reply-To: <20060209231718.62e290d4.pegasus@nerv.eu.org>
References: <20060209231718.62e290d4.pegasus@nerv.eu.org>
Message-ID: <20060210153405.GF31810@carbon.redbrick.dcu.ie>

On Thu, Feb 09, 2006 at 11:17:18PM +0100, Jure Pe?ar wrote:
> 
> Hi all,
> 
> I need to configure two services in my cluster that must run on different
> nodes at all times. I don't want them to run on the same node.
> 
> I don't see a clear way how to do that, neither from documentation nor from
> cluster config gui. Is it possible at all?

What version of Cluster Suite are you running? 

There is an option (checkbox) somewhere in the GUI of Cluster Suite 4 
to allow "run exclusively on this node", I think in the section for
adding a service. Can't remember seeing it in the documentation, I 
remember it being hard to find.  

Alternatively you could try creating a failover domain specifically
for that service, so that it will only failover (once started) onto 
nodes you specify in that domain, although the above solution is the
one you should probably use if you can find it.  

Kp 
-- 
Karl Podesta
Systems Engineer
Securelinx Ltd 
Ireland



From thaidn at gmail.com  Fri Feb 10 15:51:25 2006
From: thaidn at gmail.com (Thai Duong)
Date: Fri, 10 Feb 2006 22:51:25 +0700
Subject: [Linux-cluster] Kernel panic: GFS: Assertion failed on line 550
	offile rgrp.c
In-Reply-To: <BAY101-F338F7F23E65CFC60C161F1FA030@phx.gbl>
References: <d4e2d9970602090917o543bb2b7j87add92ea82df0b8@mail.gmail.com>
	<BAY101-F338F7F23E65CFC60C161F1FA030@phx.gbl>
Message-ID: <d4e2d9970602100751g3d7b508ds4873b5253e80aa58@mail.gmail.com>

Hi David,

There's no file in /var/crash. What am I supposed to do now? Please help.

Thai Duong.

On 2/10/06, David Gutierrez <davegu1 at hotmail.com> wrote:
>
> Check your /var/crash directory and it should have created a core file for
> you to debug, Let me know if you need help with it.
>
> The line 550 will tell you what the problem is from the file which was
> reading at the time of the crash.
>
> try to write to oracle_u02. Kernel panic: GFS: Assertion failed on line
> 550
> of file rgrp.c
>
> David
>
>
> From: Thai Duong <thaidn at gmail.com>
> Reply-To: linux clustering <linux-cluster at redhat.com>
> To: linux clustering <linux-cluster at redhat.com>
> Subject: [Linux-cluster] Kernel panic: GFS: Assertion failed on line 550
> offile rgrp.c
> Date: Fri, 10 Feb 2006 00:17:22 +0700
> MIME-Version: 1.0
> Received: from hormel.redhat.com ([209.132.177.30]) by
> bay0-mc12-f1.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 9
> Feb 2006 09:17:42 -0800
> Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com
> [10.8.4.110])by hormel.redhat.com (Postfix) with ESMTPid 1D0C372F78;
> Thu,  9
> Feb 2006 12:17:42 -0500 (EST)
> Received: from int-mx1.corp.redhat.com
> (int-mx1.corp.redhat.com[172.16.52.254])by listman.util.phx.redhat.com
> (8.13.1/8.13.1) with ESMTP idk19HHeLE012080 for
> <linux-cluster at listman.util.phx.redhat.com>;Thu, 9 Feb 2006 12:17:40 -0500
> Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32])by
> int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id k19HHd106088for
> <linux-cluster at redhat.com>; Thu, 9 Feb 2006 12:17:39 -0500
> Received: from xproxy.gmail.com (xproxy.gmail.com [66.249.82.205])by
> mx3.redhat.com (8.13.1/8.13.1) with ESMTP id k19HHVl7018430for
> <linux-cluster at redhat.com>; Thu, 9 Feb 2006 12:17:31 -0500
> Received: by xproxy.gmail.com with SMTP id s18so140501wxcfor
> <linux-cluster at redhat.com>; Thu, 09 Feb 2006 09:17:26 -0800 (PST)
> Received: by 10.70.19.6 with SMTP id 6mr262192wxs;Thu, 09 Feb 2006
> 09:17:23
> -0800 (PST)
> Received: by 10.70.88.1 with HTTP; Thu, 9 Feb 2006 09:17:22 -0800 (PST)
> X-Message-Info: yilqo4+6kc46QQYnau3sRaNMYGNvwlwM82qFzuzDRDY=
> DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta;
> d=gmail.com
> ;h=received:message-id:date:from:to:subject:mime-version:content-type;b=Qo4QO2lKhn4K678nWOMfiwWHXvV71swnE12ulh/RbrvGFYndtiR9E0XICuvelrJg++v0nNT0ZVAhnKUUNMU/SEASR8ddTz/4oLent+xyNJye1zrGQRIrUHl1EFw1jk/I/vYC6F8p/J+KB2PpZpbq8xixOvVnJtaHdo1YevY90Z4=
> X-RedHat-Spam-Score: -2.579
> X-loop: linux-cluster at redhat.com
> X-BeenThere: linux-cluster at redhat.com
> X-Mailman-Version: 2.1.5
> Precedence: junk
> List-Id: linux clustering <linux-cluster.redhat.com>
> List-Unsubscribe:
> <https://www.redhat.com/mailman/listinfo/linux-cluster>,<mailto:
> linux-cluster-request at redhat.com?subject=unsubscribe>
> List-Archive: <https://www.redhat.com/archives/linux-cluster>
> List-Post: <mailto:linux-cluster at redhat.com>
> List-Help: <mailto:linux-cluster-request at redhat.com?subject=help>
> List-Subscribe:
> <https://www.redhat.com/mailman/listinfo/linux-cluster>,<mailto:
> linux-cluster-request at redhat.com?subject=subscribe>
> Errors-To: linux-cluster-bounces at redhat.com
> Return-Path: linux-cluster-bounces at redhat.com
> X-OriginalArrivalTime: 09 Feb 2006 17:17:42.0357 (UTC)
> FILETIME=[BC473850:01C62D9C]
>
> Hi all,
>
> My story is a little bit long so your patience is highly appreciated. I
> have
> a 4-node Oracle9iR2 RAC using GFS 6.0 as the cluster file system. Each
> node
> is running RHAS3 Update 6 for Itanium. We have 2 main pools: oracle_u01 to
> store the $ORACLE_HOME, and oracle_u02 to store the datafile, controlfile,
> redolog...Many thanks to the stabilization of GFS, this production cluster
> have been running pretty well. Two days ago, the manager wants to have a
> backup cluster with the same configuration except it consits of only one
> node. Due to the lack of 64-bit servers, we got a node from the production
> cluster out to make it become the 1-node backup cluster. We make new LUNs
> in
> our EMC CX500 SAN, and clone the production's oracle_u01 and oracle_u02 to
> them. We are able to mount these pools as GFS file system onto the backup
> cluster's node. So far so good until we try to write to oracle_u02. Kernel
> panic: GFS: Assertion failed on line 550 of file rgrp.c. It's pretty weird
> cauze writing to oracle_u01 just works. They are just the same pool device
> containing GFS file system, arent they? We even can not run df against
> u02!
> We can mount and explore it using commands like cd, ls, dir...but not df!
>
> There's no other message in /var/log/message, just the above error. If you
> need any other information, please drop me a line! Please help.
>
> Regards,
>
> --Thai Duong
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060210/83773fad/attachment.htm>

From thaidn at gmail.com  Fri Feb 10 15:59:05 2006
From: thaidn at gmail.com (Thai Duong)
Date: Fri, 10 Feb 2006 22:59:05 +0700
Subject: [Linux-cluster] Kernel panic: GFS: Assertion failed on line 550
	of file rgrp.c
In-Reply-To: <1139510101.3034.62.camel@dhcp80-219.msp.redhat.com>
References: <d4e2d9970602090917o543bb2b7j87add92ea82df0b8@mail.gmail.com>
	<1139510101.3034.62.camel@dhcp80-219.msp.redhat.com>
Message-ID: <d4e2d9970602100759p5c2e1632mccc66bcdac3973a2@mail.gmail.com>

Hi Kevin,

I did unmount oracle_u02 before cloning but still no luck. When I tried to
run gfs_fsck against oracle_u02 on the backup cluster's node, it reported
something like below:

[root at rac3 root]# gfs_fsck -y /dev/pool/oracle_u02
Initializing fsck
Buffer #65773 (1 of 5) is neither GFS_METATYPE_RB nor GFS_METATYPE_RG.
Resource group is corrupted.
Unable to read in rgrp descriptor.
Unable to fill in resource group information.

It seems that oracle_u02 somehow got broken. Running gfs_fsck against
oracle_u01 works like a charm. Do i need to run gfs_fsck against the
original oracle_u02? Please advise.

Regards,

Thai Duong.

On 2/10/06, Kevin Anderson <kanderso at redhat.com> wrote:
>
> On Fri, 2006-02-10 at 00:17 +0700, Thai Duong wrote:
> > Hi all,
> >
> > My story is a little bit long so your patience is highly appreciated.
> > I have a 4-node Oracle9iR2 RAC using GFS 6.0 as the cluster file
> > system. Each node is running RHAS3 Update 6 for Itanium. We have 2
> > main pools: oracle_u01 to store the $ORACLE_HOME, and oracle_u02 to
> > store the datafile, controlfile, redolog...Many thanks to the
> > stabilization of GFS, this production cluster have been running pretty
> > well. Two days ago, the manager wants to have a backup cluster with
> > the same configuration except it consits of only one node. Due to the
> > lack of 64-bit servers, we got a node from the production cluster out
> > to make it become the 1-node backup cluster. We make new LUNs in our
> > EMC CX500 SAN, and clone the production's oracle_u01 and oracle_u02 to
> > them. We are able to mount these pools as GFS file system onto the
> > backup cluster's node. So far so good until we try to write to
> > oracle_u02. Kernel panic: GFS: Assertion failed on line 550 of file
> > rgrp.c. It's pretty weird cauze writing to oracle_u01 just works. They
> > are just the same pool device containing GFS file system, arent they?
> > We even can not run df against u02! We can mount and explore it using
> > commands like cd, ls, dir...but not df!
>
> Purely speculation, but it sounds like when you snapped the LUN, there
> was filesystem metadata that was not consistent on the storage.  This
> can happen if you did not umount the filesystem or freeze it before
> doing the clone.  Have you run fsck?
>
> Kevin
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060210/0158a186/attachment.htm>

From pegasus at nerv.eu.org  Fri Feb 10 16:04:11 2006
From: pegasus at nerv.eu.org (Jure =?UTF-8?Q?Pe=C4=8Dar?=)
Date: Fri, 10 Feb 2006 17:04:11 +0100
Subject: [Linux-cluster] disjunct services in cluster
In-Reply-To: <20060210153405.GF31810@carbon.redbrick.dcu.ie>
References: <20060209231718.62e290d4.pegasus@nerv.eu.org>
	<20060210153405.GF31810@carbon.redbrick.dcu.ie>
Message-ID: <20060210170411.2315a8a1.pegasus@nerv.eu.org>

On Fri, 10 Feb 2006 15:34:05 +0000
Karl Podesta <kpodesta at redbrick.dcu.ie> wrote:

> What version of Cluster Suite are you running? 

The one for RHEL4.
 
> There is an option (checkbox) somewhere in the GUI of Cluster Suite 4 
> to allow "run exclusively on this node", I think in the section for
> adding a service. Can't remember seeing it in the documentation, I 
> remember it being hard to find.  

Yes, I've seen it, but that's not what I want.
 
> Alternatively you could try creating a failover domain specifically
> for that service, so that it will only failover (once started) onto 
> nodes you specify in that domain, although the above solution is the
> one you should probably use if you can find it.  

That would be an option, but I have a two node cluster, so failover domains do not give me this.

I'm thinkig of simply adding appropriate checks to init scripts of these two services, so if one is already up, the other would fail on start and the cluster is going to try starting it on the other node (hopefully).


-- 

Jure Pe?ar
http://jure.pecar.org



From gforte at leopard.us.udel.edu  Fri Feb 10 16:14:57 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Fri, 10 Feb 2006 11:14:57 -0500
Subject: [Linux-cluster] disjunct services in cluster
In-Reply-To: <20060210170411.2315a8a1.pegasus@nerv.eu.org>
References: <20060209231718.62e290d4.pegasus@nerv.eu.org>	<20060210153405.GF31810@carbon.redbrick.dcu.ie>
	<20060210170411.2315a8a1.pegasus@nerv.eu.org>
Message-ID: <43ECBC01.2070907@leopard.us.udel.edu>

If you have a two-node cluster, and you don't want both services running
on the same node, then if one node goes down your whole cluster is
defunct.  So ... what's the problem with using "run exclusively"?  I
realize this locks each service to a particular node, but if you don't
want them running on the same node then there's no functional difference
between that and what you want for a two-node cluster.

-g

Jure Pe?ar wrote:
> On Fri, 10 Feb 2006 15:34:05 +0000
> Karl Podesta <kpodesta at redbrick.dcu.ie> wrote:
> 
> 
>>What version of Cluster Suite are you running? 
> 
> 
> The one for RHEL4.
>  
> 
>>There is an option (checkbox) somewhere in the GUI of Cluster Suite 4 
>>to allow "run exclusively on this node", I think in the section for
>>adding a service. Can't remember seeing it in the documentation, I 
>>remember it being hard to find.  
> 
> 
> Yes, I've seen it, but that's not what I want.
>  
> 
>>Alternatively you could try creating a failover domain specifically
>>for that service, so that it will only failover (once started) onto 
>>nodes you specify in that domain, although the above solution is the
>>one you should probably use if you can find it.  
> 
> 
> That would be an option, but I have a two node cluster, so failover domains do not give me this.
> 
> I'm thinkig of simply adding appropriate checks to init scripts of these two services, so if one is already up, the other would fail on start and the cluster is going to try starting it on the other node (hopefully).
> 
> 



From pegasus at nerv.eu.org  Fri Feb 10 16:31:04 2006
From: pegasus at nerv.eu.org (Jure =?UTF-8?Q?Pe=C4=8Dar?=)
Date: Fri, 10 Feb 2006 17:31:04 +0100
Subject: [Linux-cluster] disjunct services in cluster
In-Reply-To: <43ECBC01.2070907@leopard.us.udel.edu>
References: <20060209231718.62e290d4.pegasus@nerv.eu.org>
	<20060210153405.GF31810@carbon.redbrick.dcu.ie>
	<20060210170411.2315a8a1.pegasus@nerv.eu.org>
	<43ECBC01.2070907@leopard.us.udel.edu>
Message-ID: <20060210173104.50493320.pegasus@nerv.eu.org>

On Fri, 10 Feb 2006 11:14:57 -0500
Greg Forte <gforte at leopard.us.udel.edu> wrote:

> If you have a two-node cluster, and you don't want both services running
> on the same node, then if one node goes down your whole cluster is
> defunct.  So ... what's the problem with using "run exclusively"?  I
> realize this locks each service to a particular node, but if you don't
> want them running on the same node then there's no functional difference
> between that and what you want for a two-node cluster.

Yes, I fully understand that, but my situation is a bit more complicated. I have to do some tests to see if I'm going to go with "run exclusively" or I'm going to hack init scripts.

Since I've dumped GFS for performance reasons, I went with one node having locally mounted volume and the other node mounting it from the first one via NFS. It's obvious that I want these two mounts be always on different nodes. Now the applications that are running atop of these are another story. Some, like postgres, are going to be tied to the "local" mount, others, like apache, can live anywhere. But I first have to sort out the local vs. nfs mount issue ...


-- 

Jure Pe?ar
http://jure.pecar.org



From kanderso at redhat.com  Fri Feb 10 16:38:23 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Fri, 10 Feb 2006 10:38:23 -0600
Subject: [Linux-cluster] Kernel panic: GFS: Assertion failed on line
	550 of file rgrp.c
In-Reply-To: <d4e2d9970602100759p5c2e1632mccc66bcdac3973a2@mail.gmail.com>
References: <d4e2d9970602090917o543bb2b7j87add92ea82df0b8@mail.gmail.com>
	<1139510101.3034.62.camel@dhcp80-219.msp.redhat.com>
	<d4e2d9970602100759p5c2e1632mccc66bcdac3973a2@mail.gmail.com>
Message-ID: <1139589503.2878.18.camel@dhcp80-219.msp.redhat.com>

On Fri, 2006-02-10 at 22:59 +0700, Thai Duong wrote:
> Hi Kevin,
> 
> I did unmount oracle_u02 before cloning but still no luck. 
> When I tried to run gfs_fsck against oracle_u02 on the backup
> cluster's node, it reported something like below:
> 
> [root at rac3 root]# gfs_fsck -y /dev/pool/oracle_u02
> Initializing fsck
> Buffer #65773 (1 of 5) is neither GFS_METATYPE_RB nor GFS_METATYPE_RG.

Add some -vvvvvvv flags to the gfs_fsck command line. Each "v" adds
another layer of messages.  The DEBUG messages are at layer 7. This
should print out more information about the resource group that it is
failing to read.

> Resource group is corrupted.
> Unable to read in rgrp descriptor.
> Unable to fill in resource group information.
> 
> It seems that oracle_u02 somehow got broken. Running gfs_fsck against
> oracle_u01 works like a charm. Do i need to run gfs_fsck against the
> original oracle_u02? Please advise.

It would be interesting to see if the partitions are identical after the
snapshot.  How large are the LUNs?  Can you do a comparison of the
volumes?  I would do those steps first before the fsck.  It is possible
you have a problem with the oracle_u02, so would be interesting to run
gfs_fsck if the snapped LUNs are identical.

Kevin





From thaidn at gmail.com  Fri Feb 10 17:25:05 2006
From: thaidn at gmail.com (Thai Duong)
Date: Sat, 11 Feb 2006 00:25:05 +0700
Subject: [Linux-cluster] Kernel panic: GFS: Assertion failed on line 550
	of file rgrp.c
In-Reply-To: <1139589503.2878.18.camel@dhcp80-219.msp.redhat.com>
References: <d4e2d9970602090917o543bb2b7j87add92ea82df0b8@mail.gmail.com>
	<1139510101.3034.62.camel@dhcp80-219.msp.redhat.com>
	<d4e2d9970602100759p5c2e1632mccc66bcdac3973a2@mail.gmail.com>
	<1139589503.2878.18.camel@dhcp80-219.msp.redhat.com>
Message-ID: <d4e2d9970602100925v7dd097f2m93f3dd913c59d117@mail.gmail.com>

Hi Kevin,

Thx for your real fast reply :). The result of running gfs_fsck with
serveral "v" is below:

[root at rac3 root]# gfs_fsck -vvvvvvvvvy /dev/pool/oracle_u02
Initializing fsck
Initializing lists...
Initializing special inodes...
(file.c:45)     readi:  Offset (320) is >= the file size (320).
(super.c:211)   4 journals found.
(file.c:45)     readi:  Offset (45888) is >= the file size (45888).
(super.c:268)   478 resource groups found.
(util.c:112)    For 65773 Expected 1161970:2 - got 0:0
Buffer #65773 (1 of 5) is neither GFS_METATYPE_RB nor GFS_METATYPE_RG.
Resource group is corrupted.
Unable to read in rgrp descriptor.
Unable to fill in resource group information.
(initialize.c:364)      <backtrace> - init_sbp()

>It would be interesting to see if the partitions are identical after the
>snapshot.  How large are the LUNs?  Can you do a comparison of the
>volumes?  I would do those steps first before the fsck.  It is possible
>you have a problem with the oracle_u02, so would be interesting to run
>gfs_fsck if the snapped LUNs are identical.

We use the SnapView feature of our EMC CX500 SAN so those two LUNs _should_
be identical. In fact, we have cloned other GFS LUNs many times in the past
without no problem. Tomorrow we'll drop the destination LUN and try again if
gfs_fsck can not help.

Regards,

Thai Duong.

On 2/10/06, Kevin Anderson <kanderso at redhat.com> wrote:
>
> On Fri, 2006-02-10 at 22:59 +0700, Thai Duong wrote:
> > Hi Kevin,
> >
> > I did unmount oracle_u02 before cloning but still no luck.
> > When I tried to run gfs_fsck against oracle_u02 on the backup
> > cluster's node, it reported something like below:
> >
> > [root at rac3 root]# gfs_fsck -y /dev/pool/oracle_u02
> > Initializing fsck
> > Buffer #65773 (1 of 5) is neither GFS_METATYPE_RB nor GFS_METATYPE_RG.
>
> Add some -vvvvvvv flags to the gfs_fsck command line. Each "v" adds
> another layer of messages.  The DEBUG messages are at layer 7. This
> should print out more information about the resource group that it is
> failing to read.
>
> > Resource group is corrupted.
> > Unable to read in rgrp descriptor.
> > Unable to fill in resource group information.
> >
> > It seems that oracle_u02 somehow got broken. Running gfs_fsck against
> > oracle_u01 works like a charm. Do i need to run gfs_fsck against the
> > original oracle_u02? Please advise.
>
> It would be interesting to see if the partitions are identical after the
> snapshot.  How large are the LUNs?  Can you do a comparison of the
> volumes?  I would do those steps first before the fsck.  It is possible
> you have a problem with the oracle_u02, so would be interesting to run
> gfs_fsck if the snapped LUNs are identical.
>
> Kevin
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060211/c73161b5/attachment.htm>

From iga.cnrst at gmail.com  Thu Feb  9 21:58:43 2006
From: iga.cnrst at gmail.com (hicham lotfi simo iga)
Date: Thu, 9 Feb 2006 21:58:43 +0000
Subject: [Linux-cluster] system-cluster-manager
Message-ID: <334ba50e0602091358i5eba7372m@mail.gmail.com>

i would ask about this rpm : system-cluster-manager of the Scientific Linux
Distribution if anyone knows how to use it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060209/99561ac2/attachment.htm>

From lhh at redhat.com  Fri Feb 10 18:56:32 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 10 Feb 2006 13:56:32 -0500
Subject: [Linux-cluster] anyone experience with an itanium cluster
	running oracle ?
In-Reply-To: <43ECA429.5040205@leopard.us.udel.edu>
References: <20060210142249.GU25789@core.amc.uva.nl>
	<43ECA429.5040205@leopard.us.udel.edu>
Message-ID: <1139597792.13901.47.camel@ayanami.boston.redhat.com>

On Fri, 2006-02-10 at 09:33 -0500, Greg Forte wrote:
> I am slowly building something similar, though with 10g and ES and not
> Itanium (EM64T, though), and the outcome so far is a qualified "maybe".
> Oracle is mostly OK with failing over from node to node, using just init
> scripts I've crafted and the standard CS stuff - the database engine and
> the listener work fine, as long as I temporarily change the hostname
> during startup (and shutdown) of the services to the one it expects to
> see (which is an alias associated with a floating IP address that always
> gets assigned to the node running Oracle, anyway), but Enterprise
> Manager is not so happy - parts of it work, parts of it don't,
> specifically backup scheduling, patch grabbing/application, and
> communication with the listener.  Other services that use Oracle have
> been set up and are happy as clams, though.  I'm still waiting for some
> help from Oracle on what the issue might be with EM.  I'll post further
> info to the list as it comes ... if anyone else has been here before,
> please do pipe up!

Here's a RHCS4-ized Oracle iAS (infrastructure) script.  I think it
lives in CVS head.

An older version of this script was used for CFC certification on RHEL
2.1 - so things may have changed since then, but it's probably a
reasonable start.

I don't recall having to have the tweak the hostname; that sounds like a
new thing to me.  Also, I don't know all the possible ways to get Oracle
running -- chances are excellent that this won't work except for an "iAS
Infrastructure" installation.

-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracleas
Type: application/x-shellscript
Size: 17530 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060210/90ddd95e/attachment.bin>

From lhh at redhat.com  Fri Feb 10 18:59:30 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 10 Feb 2006 13:59:30 -0500
Subject: [Linux-cluster] largest CIFS/Samba  export? largest GFS export?
In-Reply-To: <9D61626573F86843908C9AE8BDB4390555EB4D@jellyfish.highlyscyld.com>
References: <9D61626573F86843908C9AE8BDB4390555EB4D@jellyfish.highlyscyld.com>
Message-ID: <1139597970.13901.51.camel@ayanami.boston.redhat.com>

On Thu, 2006-02-09 at 13:56 -0800, Michael Will wrote:
> What are current size limits for large storage pools. I was
> considering
> aggragating 20 2TB luns on a fibre SAN with LVM2 into one 40TB volume,
> create a filesystem on it with XFS and export it both with NFS v3 to
> linux clients
> as well as with samba to windows clients.
>  
> The linux clients are all 2.6 kernel 64bit, so they should not have an
> issue with this,
> but what about the 32bit windows clients running windows2003 and
> windowsxp ? 
>  
> Do you have any experience exporting larger than 2TB filesystems to
> windows clients through samba?
 
> Does GFS have similar restrictions (linux clients only assumed) ?

GFS is not like Samba or NFS - there's no 'client' and no 'file server'
with GFS.

You can use NFS to export GFS (but you can't use Samba to do it right
now).

-- Lon




From lhh at redhat.com  Fri Feb 10 19:02:58 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 10 Feb 2006 14:02:58 -0500
Subject: [Linux-cluster] strange modprobe drops VIP & fails service
In-Reply-To: <1139573771.6972.9.camel@linux.site>
References: <1139573771.6972.9.camel@linux.site>
Message-ID: <1139598178.13901.53.camel@ayanami.boston.redhat.com>

On Fri, 2006-02-10 at 07:16 -0500, danwest wrote:
> Has anyone every seen something like this?  A modprobe with either a
> "module 0" or a blank "module " is always followed by the interface
> loosing its VIP and thus initiating a service failure.  All non
> essential system services were turned off.  Not sure what is
> calling/causing the strange modprobe.  Logs are clean, no other events
> leading up to this.  I am now wondering if it could be the cluster
> itself?
> 
> Feb  9 13:44:32 node2 modprobe: modprobe: Can't locate module 0
> Feb  9 13:44:32 node2 clusvcmgrd: [18150]: <err> service error: IP
> address 147.107.188.81 missing
> Feb  9 13:44:32 node2 clusvcmgrd: [18150]: <err> service error: 0: error
> fetching interface information: Device not found
> Feb  9 13:44:32 node2 clusvcmgrd: [18150]: <err> service error: Check
> status failed on IP addresses for mservice1

Sounds like the e1000 + multiple NIC problem.

-- Lon




From tgrman21 at gmail.com  Fri Feb 10 19:03:36 2006
From: tgrman21 at gmail.com (Curt Moore)
Date: Fri, 10 Feb 2006 13:03:36 -0600
Subject: [Linux-cluster] GNBD Configuration Question
Message-ID: <6d9358b0602101103v6ac3f0b7sc55f513b7489c7fb@mail.gmail.com>

Hello all.

I've been experimenting with RH Cluster Suite and GFS and have come
upon a few questions which I hope the list will be able to help. 
Kudos to all of the developers, RHCS and GFS are really cool!

To my question, I'm trying to setup a storage network with GFS and
GNBD using a 3 layered approach as shown in Figure 5 in the following
link:

http://www.redhat.com/magazine/008jun05/features/gfs/#fig=multipath

and also here:

http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/s1-ov-perform.html#S2-OV-MODPRICE

Obviously, the intent is to eliminate any SPOF for the storage network.

For the sake of example, let's say that I have 2 GNBD servers
connected directly to the SAN, snode001 and snode002.

If I export the the same SAN block device from these 2 GNBD servers
for load sharing purposes and snode001 fails, how do the GNBD clients
importing that block device from snode001 know that they can also find
that block device on snode002?  Is this somehow handled at a lower
level by configuring a resource within the RH Cluster Suite?

I've scoured the list archives and found the following example, using
multipath, which seems to come very close:

http://www.redhat.com/archives/linux-cluster/2005-April/msg00065.html

However, the documentation states that multipath GNBD cannot be used
with GFS 6.1:
http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/ch-gnbd.html

Is there another way of accomplishing this without using multipath or
am I misunderstanding the concept of how multipath is utilized in this
setup?

Any feedback would be appreciated.

Thanks!
-Curt



From adingman at cook-inc.com  Fri Feb 10 21:10:50 2006
From: adingman at cook-inc.com (Andrew C. Dingman)
Date: Fri, 10 Feb 2006 16:10:50 -0500
Subject: [Linux-cluster] Fencing agents on DELL PowerEdge servers
In-Reply-To: <43EB5D61.8010008@webbertek.com.br>
References: <43EB5D61.8010008@webbertek.com.br>
Message-ID: <1139605850.11236.6.camel@adingman.cin.cook>

I'm not entirely sure I know which cluster suite versions you mean, but
using the cluster suite sold for with RHEL 3 all the fencing devices
were Perl scripts. I managed to write a couple of my own using the
examples, one a modified agent for APC 7901 power strips and one for
DRAM/MC devices in blad chassis. The DRAC/MC one is vastly simpler, and
it seems that one for a regular drac would be even a bit easier. Cluster
Suite in RHEL 4 supports DRACs out of the box in the GUI, though I
haven't tested it.

If you want my dracmc fencing agent as a starting point for your own
DRAC fencing agent, you can find it posted to this list about a year
ago. There's also one that was in CVS at the time using Dell's "racadm"
utility.

On Thu, 2006-02-09 at 13:18 -0200, Celso K. Webber wrote:
> Hello all,
> 
> Would someone please tell me if there is a way to use a Dell DRAC card 
> as a fencing device under Red Hat Cluster Suite v3? I could successfully 
> configure the DRACs and Cluster Suite v4 to use the Dell server's DRACs 
> as fencing devices under Red Hat Cluster Suite v4, but I could not find 
> a way to do that under CS v3 since the fencing agents are in binary 
> format, not in shell script format as they are in CS v4.
> 
> Alternatively, does anyone know if it is possible to use Dell's 
> "ipmitool" and "ipmish" commands over the server's BMC IP so that I 
> could build a fencing agent script for Cluster Suite v4?
> 
> Thanks for your advice.
> 
> Celso.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From omer at faruk.net  Sat Feb 11 21:57:32 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Sat, 11 Feb 2006 23:57:32 +0200 (EET)
Subject: [Linux-cluster] service restart problem
Message-ID: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>


Hi,

I have a problem with redhat cluster suite. I have a two node test
cluster. cluster master is cluster2 which is running vsftpd, mysql as
service. I have manually edited vsftpd.conf so it can't start on cluster2.
Then I killed vsftpd process. But after that cluster didn't failover to
cluster1 or after a few seconds I have corrected vsftpd.conf but cluster2
didn't start this service. What I want to ask that is redhat cluster
doesn't support service status check so it can restart failed service or
does it support fail over if one resource doesn't work?

Best regards,




From tekion at gmail.com  Sat Feb 11 23:22:51 2006
From: tekion at gmail.com (Screaming Eagle)
Date: Sat, 11 Feb 2006 18:22:51 -0500
Subject: [Linux-cluster] Trying to compile GFS for FC3...
Message-ID: <ee9c961f0602111522v203e40acp4e9f7d4adcca6c61@mail.gmail.com>

All,
I am getting this when compiling for the cluster source for FC3:

cd dlm-kernel && make
make[1]: Entering directory `/root/.src/cluster/dlm-kernel'
cd src2 && make all
make[2]: Entering directory `/root/.src/cluster/dlm-kernel/src2'
make -C /lib/modules/2.6.12-1.1378_FC3/buildM=/root/.src/cluster/dlm-kernel/src2
modules USING_KBUILD=yes
make[3]: Entering directory `/lib/modules/2.6.12-1.1378_FC3/build'
  CC [M]  /root/.src/cluster/dlm-kernel/src2/config.o
/root/.src/cluster/dlm-kernel/src2/config.c:16:28: linux/configfs.h: No such
file or directory
/root/.src/cluster/dlm-kernel/src2/config.c:44: warning: "struct
config_item" declared inside parameter list
/root/.src/cluster/dlm-kernel/src2/config.c:44: warning: its scope is only
this definition or declaration, which is probably not what you want
/root/.src/cluster/dlm-kernel/src2/config.c:45: warning: "struct
config_item" declared inside parameter list
/root/.src/cluster/dlm-kernel/src2/config.c:47: warning: "struct
config_item" declared inside parameter list
/root/.src/cluster/dlm-kernel/src2/config.c:48: warning: "struct
config_item" declared inside parameter list
/root/.src/cluster/dlm-kernel/src2/config.c:57: warning: "struct
configfs_attribute" declared inside parameter list
/root/.src/cluster/dlm-kernel/src2/config.c:59: warning: "struct
configfs_attribute" declared inside parameter list
/root/.src/cluster/dlm-kernel/src2/config.c:61: warning: "struct
configfs_attribute" declared inside parameter list
/root/.src/cluster/dlm-kernel/src2/config.c:63: warning: "struct
configfs_attribute" declared inside parameter list
/root/.src/cluster/dlm-kernel/src2/config.c:82: error: field `attr' has
incomplete type
/root/.src/cluster/dlm-kernel/src2/config.c:88: warning: braces around
scalar initializer
/root/.src/cluster/dlm-kernel/src2/config.c:88: warning: (near
initialization for `comm_attr_nodeid.attr')
/root/.src/cluster/dlm-kernel/src2/config.c:88: error: field name not in
record or union initializer
/root/.src/cluster/dlm-kernel/src2/config.c:88: error: (near initialization
for `comm_attr_nodeid.attr')
/root/.src/cluster/dlm-kernel/src2/config.c:89: error: field name not in
record or union initializer
/root/.src/cluster/dlm-kernel/src2/config.c:89: error: (near initialization
for `comm_attr_nodeid.attr')
/root/.src/cluster/dlm-kernel/src2/config.c:89: warning: excess elements in
scalar initializer
/root/.src/cluster/dlm-kernel/src2/config.c:89: warning: (near
initialization for `comm_attr_nodeid.attr')
/root/.src/cluster/dlm-kernel/src2/config.c:90: error: field name not in
record or union initializer
/root/.src/cluster/dlm-kernel/src2/config.c:90: error: (near initialization
for `comm_attr_nodeid.attr')
/root/.src/cluster/dlm-kernel/src2/config.c:90: warning: excess elements in
scalar initializer
/root/.src/cluster/dlm-kernel/src2/config.c:90: warning: (near
initialization for `comm_attr_nodeid.attr')
/root/.src/cluster/dlm-kernel/src2/config.c:91: confused by earlier errors,
bailing out
make[4]: *** [/root/.src/cluster/dlm-kernel/src2/config.o] Error 1
make[3]: *** [_module_/root/.src/cluster/dlm-kernel/src2] Error 2
make[3]: Leaving directory `/lib/modules/2.6.12-1.1378_FC3/build'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/root/.src/cluster/dlm-kernel/src2'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/root/.src/cluster/dlm-kernel'
make: *** [all] Error 2

The configure command is run as: ./configure
--kernel_src=/lib/modules/`uname -r`/build

Any idea what it is bombing out like this?

BTW, I have google for any docs related to getting GFS to work on FC3, so
far no luck, any docs would be nice.  Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060211/54cda2b7/attachment.htm>

From chowdary_9 at hotmail.com  Sun Feb 12 10:24:21 2006
From: chowdary_9 at hotmail.com (kishore chowdary)
Date: Sun, 12 Feb 2006 15:54:21 +0530
Subject: [Linux-cluster] Redhat web server cluster with 4 web server's & 2
	Database server
In-Reply-To: <mailman.0.1139739564.19198.linux-cluster@redhat.com>
Message-ID: <BAY111-F1143DB5985482F03EDE5BB4040@phx.gbl>

Hi guys

I am new to this place.infact this is my first post.we r planning to deploy 
a web server cluster using redhat cluster suite.

we r planning to go for 4 web server's & two database server's(Mysql in 
master slave mode) for redundancy

Hardware available

webserver's 4 No's:                             Dell PowerEdge 1850 Xeon 3.0 
*2Ghz,800 FSB
                                                                146*2 SCSI 
Ultra320 (10,000 rpm) 2 GB Ram

DB Server's 2 No's:                            Dell PowerEdge 1850 Xeon 3.0 
*2Ghz,800 FSB
                                                               146*2 SCSI 
Ultra320 (10,000 rpm) 4 GB Ram

one DB server will be set up in master mode & the other in slave mode for 
redundancy & backup ncase the main DB server goes down

Expected web traffic:                           700-1000 hits per second max

Most of the content served will be static content all these server r in a 
good datacenter & r interconected through gigabit ethernet.

We have licensed version of Redhat AS 4.2 for all the server's & planning to 
buy Redhat Cluster Suite from redhat.

so i need some guidance from u guys on how to setup this cluster,which file 
system to use.if u have step by step installation notes of starting from 
redhat OS installation to setting up cluster with mysql redundancy.

Do i need any hardware apart from this to set up a cluster.

Please let me know if u have any details reg this.

Thanx & Regards
Santhi Kishore




From libregeek at gmail.com  Mon Feb 13 05:20:50 2006
From: libregeek at gmail.com (Manilal K M)
Date: Mon, 13 Feb 2006 10:50:50 +0530
Subject: [Linux-cluster] Where's the documentation?
In-Reply-To: <20060209155234.GB10253@csail.mit.edu>
References: <20060209155234.GB10253@csail.mit.edu>
Message-ID: <2315046d0602122120q39d50c96i@mail.gmail.com>

On 09/02/06, Jonathan D. Proulx <jon at csail.mit.edu> wrote:
>
> I have a piel of dumb questions that are best answered with RTFM, but
> the only FM I seem to have is a couple of small README files in the
> source diurectory.
>
> [jon at borg1 src]$ ls cluster-1.01.00/doc/
> min-gfs.txt  usage.txt
>
> This was enough for me to get it working but I still have a
> particularly poor understanding of fencing and shut down procedure (I
> can get teh system to start cold, but if I bring a node down it
> hangs.  seems to still be trying to leave the cluster after the
> network goes down, bu tI have the clust shutdown scripts way before
> the network shutdown scripts).
>
> The only reference I can find in the archives points to the same two
> documents (which are marvels of brevity, and as I said did get me
> working).  Is there anything a bit more in depth?
>

Look at this one also. It may help you.
http://www.redhat.com/docs/manuals/csgfs/

regards
Manilal



From Anthony.Assi at irisa.fr  Mon Feb 13 09:26:41 2006
From: Anthony.Assi at irisa.fr (Anthony Assi)
Date: Mon, 13 Feb 2006 10:26:41 +0100
Subject: [Linux-cluster] Users Password storage
Message-ID: <43F050D1.1080404@irisa.fr>


Hi,

i would like to have an advise on the fact whether i should or should 
not store my users passwords,
and if so, where to save them? (like on a usb Key, on a crypted password 
protected file,etc...)


Regards,
Anthony.



From f_schliefer at vcc.de  Mon Feb 13 10:04:02 2006
From: f_schliefer at vcc.de (Frank Schliefer)
Date: Mon, 13 Feb 2006 11:04:02 +0100
Subject: [Linux-cluster] Node lag
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E06@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E06@tmaemail.techma.com>
Message-ID: <43F05992.5050401@vcc.de>

Hi,

yeah your right, wrong file size:

Here are the test results with a 2048MB file size. The Raid itself holds 
1024MB Cache in RAM.

# tiotest -f 2048
Tiotest results for 4 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        8192 MBs |   58.6 s | 139.801 MB/s |  28.0 %  | 987.1 % |
| Random Write   16 MBs |    1.9 s |   8.435 MB/s |   0.9 %  |  29.6 % |
| Read         8192 MBs |   54.9 s | 149.176 MB/s |  16.6 %  | 171.7 % |
| Random Read    16 MBs |   10.4 s |   1.509 MB/s |   0.2 %  |   4.1 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        0.108 ms |      980.220 ms |  0.00000 |   0.00000 |
| Random Write |        1.237 ms |      198.483 ms |  0.00000 |   0.00000 |
| Read         |        0.104 ms |      185.499 ms |  0.00000 |   0.00000 |
| Random Read  |       10.178 ms |      116.995 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        0.117 ms |      980.220 ms |  0.00000 |   0.00000 |
`--------------+-----------------+-----------------+----------+-----------'

So the IO is mainly the same on all nodes with the right file size, but 
the question still is why is the random read/write performance so bad !!


More Infos about the Systems as:
Each Node got 2048MB RAM and dual Xeon CPU.
As FC Controller we are using are QLogic Corp. QLA2312
As a Switch and for fencing the Qlogic 5202.
The Raid itself is an easyRAID Q16+ with 16 Disk and it performance very 
well under eg XFS.

Any further hints ?

-- 
----

Frank Schliefer


Kovacs, Corey J. schrieb:
> Also, I think it might be interesting to see what happens when you use data
> sizes that
> will overrun any cacheing being done. I've seen great performance using a
> simple MSA1000
> as long as there is a lot of cache available on the SAN itself. As soon as I
> run tests with
> data sets larger then the cache size, the performance falls to the floor.
> Unless your over
> loading the cache, you might not be getting a true metric of whats really
> getting written 
> to disk.
> 
> Maybe the slow node is getting hit by cache overhead from the SAN? 
> 
> 
> Just a thought
> 
> 
> Corey
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
> Sent: Thursday, February 09, 2006 9:18 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Node lag
> 
> Frank Schliefer wrote:
> 
>>Hi,
>>
>>after setting up an four node cluster we have one node that is way 
>>slower than the other 3 nodes.
>>
>>We using eg. tiotest for benchmarking the GFS.
>>
>>Normal Node:
>>Tiotest results for 4 concurrent io threads:
>>,----------------------------------------------------------------------.
>>| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
>>+-----------------------+----------+--------------+----------+---------+
>>| Write          40 MBs |    0.2 s | 227.426 MB/s |  36.4 %  | 384.4 % |
>>| Random Write   16 MBs |    0.1 s | 143.405 MB/s |  58.7 %  | 146.9 % |
>>| Read           40 MBs |    0.0 s | 2558.199 MB/s | 307.0 %  | 1228.0 % |
>>| Random Read    16 MBs |    0.0 s | 2685.169 MB/s | 550.0 %  | 1374.9 % |
>>`----------------------------------------------------------------------'
>>
>>
>>Slow Node:
>>Tiotest results for 4 concurrent io threads:
>>,----------------------------------------------------------------------.
>>| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
>>+-----------------------+----------+--------------+----------+---------+
>>| Write          40 MBs |    1.4 s |  27.687 MB/s |   2.2 %  | 121.8 % |
>>| Random Write   16 MBs |    4.2 s |   3.695 MB/s |   0.0 %  |   7.9 % |
>>| Read           40 MBs |    0.0 s | 2228.288 MB/s |  89.1 %  | 1337.1 % |
>>| Random Read    16 MBs |    0.0 s | 2252.739 MB/s | 230.7 %  | 692.1 % |
>>`----------------------------------------------------------------------'
>>
>>any hints why this could happen ??
>>
>>Using kernel 2.6.15.2 (sorry no RH)
> 
> 
> It would be helpful if you could give us more information about your
> installation: disk topology, lock manager in use (and which nodes are
> lockservers if using GULM) and whether it matters which nodes are started
> first or not.
> 



From thaidn at gmail.com  Mon Feb 13 16:23:41 2006
From: thaidn at gmail.com (Thai Duong)
Date: Mon, 13 Feb 2006 23:23:41 +0700
Subject: [Linux-cluster] May I turn on direct I/O and/or async I/O when
	using GFS as the cluster file system for Oracle9i RAC?
Message-ID: <d4e2d9970602130823y15edf756oc2cfb8f6be5d69a9@mail.gmail.com>

Hi there,

I'm using Red Hat GFS 6.0 as the cluster file system for a 4-node Oracle RAC
9.2.0.7, each node is running RHAS 3 Update 6. There're probably a lot of
benefits in using direct I/O and/or async I/O with Oracle9i RAC. RH GFS
itself supports direct I/O regardless of the application running on top of
it. I have asked a Oracle Support Service guy and he confirms that we can
use direct I/O feature of RH GFS but he doest have information about async
I/O. May I turn on direct I/O and/or async I/O (I mean recompile Oracle to
support async I/O) when using GFS as the cluster file system for Oracle9i
RAC? Do you have any experience with this? Please help.

TIA,

Thai Duong.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060213/6937ed4e/attachment.htm>

From wcheng at redhat.com  Mon Feb 13 16:35:17 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 13 Feb 2006 11:35:17 -0500
Subject: [Linux-cluster] May I turn on direct I/O and/or async I/O when
	using GFS as the cluster file system for Oracle9i RAC?
In-Reply-To: <d4e2d9970602130823y15edf756oc2cfb8f6be5d69a9@mail.gmail.com>
References: <d4e2d9970602130823y15edf756oc2cfb8f6be5d69a9@mail.gmail.com>
Message-ID: <43F0B545.3020708@redhat.com>

Thai Duong wrote:

> Hi there,
>
> I'm using Red Hat GFS 6.0 as the cluster file system for a 4-node 
> Oracle RAC 9.2.0.7 <http://9.2.0.7>, each node is running RHAS 3 
> Update 6. There're probably a lot of benefits in using direct I/O 
> and/or async I/O with Oracle9i RAC. RH GFS itself supports direct I/O 
> regardless of the application running on top of it. I have asked a 
> Oracle Support Service guy and he confirms that we can use direct I/O 
> feature of RH GFS but he doest have information about async I/O. May I 
> turn on direct I/O and/or async I/O (I mean recompile Oracle to 
> support async I/O) when using GFS as the cluster file system for 
> Oracle9i RAC? Do you have any experience with this? Please help.
>
Direct IO is ok for both GFS 6.0 and GFS 6.1. Asyn IO is enabled in >=  
RHEL 4 (GFS 6.1) Update 3 (in beta testing).

Also there could have deadlock issues with Direct IO with Oracle under 
GFS 6.1 (<= RHEL 4 U2).

-- Wendy




From thaidn at gmail.com  Mon Feb 13 17:41:47 2006
From: thaidn at gmail.com (Thai Duong)
Date: Tue, 14 Feb 2006 00:41:47 +0700
Subject: [Linux-cluster] May I turn on direct I/O and/or async I/O when
	using GFS as the cluster file system for Oracle9i RAC?
In-Reply-To: <43F0B545.3020708@redhat.com>
References: <d4e2d9970602130823y15edf756oc2cfb8f6be5d69a9@mail.gmail.com>
	<43F0B545.3020708@redhat.com>
Message-ID: <d4e2d9970602130941wbef0c29wfa041841783a7c7f@mail.gmail.com>

Hi Wendy,

Thx for the information. I'll turn on direct I/O on my system. Do you have
any information about how many percent I should gain in term of I/O
performance when turning direct I/O on in general and with Oracle RAC in
particular?

TIA,

--Thai Duong.

On 2/13/06, Wendy Cheng <wcheng at redhat.com> wrote:
>
> Thai Duong wrote:
>
> > Hi there,
> >
> > I'm using Red Hat GFS 6.0 as the cluster file system for a 4-node
> > Oracle RAC 9.2.0.7 <http://9.2.0.7>, each node is running RHAS 3
> > Update 6. There're probably a lot of benefits in using direct I/O
> > and/or async I/O with Oracle9i RAC. RH GFS itself supports direct I/O
> > regardless of the application running on top of it. I have asked a
> > Oracle Support Service guy and he confirms that we can use direct I/O
> > feature of RH GFS but he doest have information about async I/O. May I
> > turn on direct I/O and/or async I/O (I mean recompile Oracle to
> > support async I/O) when using GFS as the cluster file system for
> > Oracle9i RAC? Do you have any experience with this? Please help.
> >
> Direct IO is ok for both GFS 6.0 and GFS 6.1. Asyn IO is enabled in >=
> RHEL 4 (GFS 6.1) Update 3 (in beta testing).
>
> Also there could have deadlock issues with Direct IO with Oracle under
> GFS 6.1 (<= RHEL 4 U2).
>
> -- Wendy
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060214/85e25d76/attachment.htm>

From wcheng at redhat.com  Mon Feb 13 18:54:42 2006
From: wcheng at redhat.com (Wendy Cheng)
Date: Mon, 13 Feb 2006 13:54:42 -0500
Subject: [Linux-cluster] May I turn on direct I/O and/or async I/O when
	using GFS as the cluster file system for Oracle9i RAC?
In-Reply-To: <d4e2d9970602130941wbef0c29wfa041841783a7c7f@mail.gmail.com>
References: <d4e2d9970602130823y15edf756oc2cfb8f6be5d69a9@mail.gmail.com>	<43F0B545.3020708@redhat.com>
	<d4e2d9970602130941wbef0c29wfa041841783a7c7f@mail.gmail.com>
Message-ID: <43F0D5F2.9040906@redhat.com>

Thai Duong wrote:

> Hi Wendy,
>
> Thx for the information. I'll turn on direct I/O on my system. Do you 
> have any information about how many percent I should gain in term of 
> I/O performance when turning direct I/O on in general and with Oracle 
> RAC in particular?
>
Our performance group did pass the numbers around sometime ago - the 
gain seemed to be quite positive but I don't remember the details. On 
the other hand, do remember this is not a frequent execution code path 
so try to take the switching steps slowly (i.e. make sure you have 
backup plan, etc, particularly on GFS 6.0 system).

-- Wendy



From michaelc at cs.wisc.edu  Mon Feb 13 19:39:41 2006
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Mon, 13 Feb 2006 13:39:41 -0600
Subject: [Linux-cluster] Re: Doubts+Questions about iSCSI+LVM+GFS;
	multiple targets->one host
In-Reply-To: <1139837072.429781.158420@z14g2000cwz.googlegroups.com>
References: <1139837072.429781.158420@z14g2000cwz.googlegroups.com>
Message-ID: <43F0E07D.9070906@cs.wisc.edu>

michael.weitzel at uni-siegen.de wrote:
> Hi,
> 
> has anyone tried Open-iSCSI together with LVM and GFS (Global File
> System; Redhat open source).
> 
> I have a cluster of 8 nodes (x86_64) where 4 nodes export an IDE disk
> via ietd. The IDE disks were labeled by LVM's pvcreate and build a
> volume group with a single logical volume. This volume group can be
> seen from all nodes (nodes which export their own IDE drive use the
> native block dev instead of the SCSI-dev imported with open-iscsi).
> 
> I've compiled and installed redhat's GFS and CLVMD to use this
> distributed volume group on all nodes simultaneously. Creating and
> mounting the GFS volume works but the filesystem gets corrupted after
> some testing. Any ideas? Can I disable write-caching for the
> iSCSI-devices? Has anyone used iSCSI+GFS before?

iSCSI and GFS should work. It does when using GFS and the older iscsi 
driver. I have not tried it with open-iscsi though. I did not imagine 
there would be a problem.

> 
> Another problem: open-iscsi creates a new scsi-host for each imported
> disk. Is it possible to "bundle" the devices on a single host?

What target are you using?



From marc at blarg.net  Mon Feb 13 19:56:20 2006
From: marc at blarg.net (Marc Lewis)
Date: Mon, 13 Feb 2006 11:56:20 -0800
Subject: [Linux-cluster] service restart problem
In-Reply-To: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>
References: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>
Message-ID: <20060213195619.GA23750@opus.corp.blarg.net>

On Sat, Feb 11, 2006 at 11:57:32PM +0200, Omer Faruk Sen wrote:
> 
> Hi,
> 
> I have a problem with redhat cluster suite. I have a two node test
> cluster. cluster master is cluster2 which is running vsftpd, mysql as
> service. I have manually edited vsftpd.conf so it can't start on cluster2.
> Then I killed vsftpd process. But after that cluster didn't failover to
> cluster1 or after a few seconds I have corrected vsftpd.conf but cluster2
> didn't start this service. What I want to ask that is redhat cluster
> doesn't support service status check so it can restart failed service or
> does it support fail over if one resource doesn't work?
> 
> Best regards,

I'm seeing similar issues here.  The script entry doesn't seem to do
anything when checking status.

For example, we have a MySQL service defined with an IP address, a shared
SAN partition, and the /etc/init.d/mysqld script. 

The service starts up and shuts down fine when done manually via clusvcadm,
but if I kill the mysql daemon with the script or manually, the clurgmgrd
doesn't seem to care.  It just runs its status check, which does report it
as "stopped" without ever restarting the service.

Also, I've seen clurgmgrd die without logging anything anywhere.  I'll just
check the cluster and it won't be running.  All of the services stay
running, but the manager is dead.  Restarting it is problematic since it
will restart each of the services causing a brief interruption.

Anyone have any ideas on how to solve either of these two problems?  I've
been waiting to deploy the cluster we've put together until I could resolve
these two issues, but have run out of things to try.

 - Marc

-- 
Marc Lewis
Blarg! Online Services, Inc.



From bmarzins at redhat.com  Mon Feb 13 20:02:59 2006
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Mon, 13 Feb 2006 14:02:59 -0600
Subject: [Linux-cluster] GNBD Configuration Question
In-Reply-To: <6d9358b0602101103v6ac3f0b7sc55f513b7489c7fb@mail.gmail.com>
References: <6d9358b0602101103v6ac3f0b7sc55f513b7489c7fb@mail.gmail.com>
Message-ID: <20060213200258.GA25174@ether.msp.redhat.com>

On Fri, Feb 10, 2006 at 01:03:36PM -0600, Curt Moore wrote:
> Hello all.
> 
> I've been experimenting with RH Cluster Suite and GFS and have come
> upon a few questions which I hope the list will be able to help. 
> Kudos to all of the developers, RHCS and GFS are really cool!
> 
> To my question, I'm trying to setup a storage network with GFS and
> GNBD using a 3 layered approach as shown in Figure 5 in the following
> link:
> 
> http://www.redhat.com/magazine/008jun05/features/gfs/#fig=multipath
> 
> and also here:
> 
> http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/s1-ov-perform.html#S2-OV-MODPRICE
> 
> Obviously, the intent is to eliminate any SPOF for the storage network.
> 
> For the sake of example, let's say that I have 2 GNBD servers
> connected directly to the SAN, snode001 and snode002.
> 
> If I export the the same SAN block device from these 2 GNBD servers
> for load sharing purposes and snode001 fails, how do the GNBD clients
> importing that block device from snode001 know that they can also find
> that block device on snode002?  Is this somehow handled at a lower
> level by configuring a resource within the RH Cluster Suite?
> 
> I've scoured the list archives and found the following example, using
> multipath, which seems to come very close:
> 
> http://www.redhat.com/archives/linux-cluster/2005-April/msg00065.html
> 
> However, the documentation states that multipath GNBD cannot be used
> with GFS 6.1:
> http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/ch-gnbd.html
> 
> Is there another way of accomplishing this without using multipath or
> am I misunderstanding the concept of how multipath is utilized in this
> setup?

The only way for a gnbd client to access the same data served up by two gnbd
servers is to use some multipath implementation. Currently there is no
mulipath implementation that supports gnbd devicse. sorry.

-Ben

 
> Any feedback would be appreciated.
> 
> Thanks!
> -Curt
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From tgrman21 at gmail.com  Mon Feb 13 21:27:37 2006
From: tgrman21 at gmail.com (Curt Moore)
Date: Mon, 13 Feb 2006 15:27:37 -0600
Subject: [Linux-cluster] GNBD Configuration Question
In-Reply-To: <20060213200258.GA25174@ether.msp.redhat.com>
References: <6d9358b0602101103v6ac3f0b7sc55f513b7489c7fb@mail.gmail.com>
	<20060213200258.GA25174@ether.msp.redhat.com>
Message-ID: <6d9358b0602131327l8480e94qf81f6cd7f88cf10c@mail.gmail.com>

On 2/13/06, Benjamin Marzinski <bmarzins at redhat.com> wrote:
> On Fri, Feb 10, 2006 at 01:03:36PM -0600, Curt Moore wrote:
> > Hello all.
> >
> > I've been experimenting with RH Cluster Suite and GFS and have come
> > upon a few questions which I hope the list will be able to help.
> > Kudos to all of the developers, RHCS and GFS are really cool!
> >
> > To my question, I'm trying to setup a storage network with GFS and
> > GNBD using a 3 layered approach as shown in Figure 5 in the following
> > link:
> >
> > http://www.redhat.com/magazine/008jun05/features/gfs/#fig=multipath
> >
> > and also here:
> >
> > http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/s1-ov-perform.html#S2-OV-MODPRICE
> >
> > Obviously, the intent is to eliminate any SPOF for the storage network.
> >
> > For the sake of example, let's say that I have 2 GNBD servers
> > connected directly to the SAN, snode001 and snode002.
> >
> > If I export the the same SAN block device from these 2 GNBD servers
> > for load sharing purposes and snode001 fails, how do the GNBD clients
> > importing that block device from snode001 know that they can also find
> > that block device on snode002?  Is this somehow handled at a lower
> > level by configuring a resource within the RH Cluster Suite?
> >
> > I've scoured the list archives and found the following example, using
> > multipath, which seems to come very close:
> >
> > http://www.redhat.com/archives/linux-cluster/2005-April/msg00065.html
> >
> > However, the documentation states that multipath GNBD cannot be used
> > with GFS 6.1:
> > http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/ch-gnbd.html
> >
> > Is there another way of accomplishing this without using multipath or
> > am I misunderstanding the concept of how multipath is utilized in this
> > setup?
>
> The only way for a gnbd client to access the same data served up by two gnbd
> servers is to use some multipath implementation. Currently there is no
> mulipath implementation that supports gnbd devicse. sorry.
>
> -Ben
>

Ben,

Thanks for your reply.

That's a bummer.  So, older versions of GFS supported multipath but
6.1 somehow broke this?

I suppose I could accomplish this using iSCSI SAN devices or perhaps
even setup a few servers running as iSCSI targets which are in turn
physically connected to the FC SAN.  Either of these scenarios should
support multipath, correct?

Has anyone had good or bad experiences when trying to run multipath on
iSCSI devices for a fault tolerant SAN?

Thanks,
-Curt

>
> > Any feedback would be appreciated.
> >
> > Thanks!
> > -Curt
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From marc at blarg.net  Tue Feb 14 00:40:03 2006
From: marc at blarg.net (Marc Lewis)
Date: Mon, 13 Feb 2006 16:40:03 -0800
Subject: [Linux-cluster] service restart problem
In-Reply-To: <20060213195619.GA23750@opus.corp.blarg.net>
References: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>
	<20060213195619.GA23750@opus.corp.blarg.net>
Message-ID: <20060214004003.GE23750@opus.corp.blarg.net>

On Mon, Feb 13, 2006 at 11:56:20AM -0800, Marc Lewis wrote:
> On Sat, Feb 11, 2006 at 11:57:32PM +0200, Omer Faruk Sen wrote:
> > 
> > Hi,
> > 
> > I have a problem with redhat cluster suite. I have a two node test
> > cluster. cluster master is cluster2 which is running vsftpd, mysql as
> > service. I have manually edited vsftpd.conf so it can't start on cluster2.
> > Then I killed vsftpd process. But after that cluster didn't failover to
> > cluster1 or after a few seconds I have corrected vsftpd.conf but cluster2
> > didn't start this service. What I want to ask that is redhat cluster
> > doesn't support service status check so it can restart failed service or
> > does it support fail over if one resource doesn't work?
> > 
> > Best regards,
> 
> I'm seeing similar issues here.  The script entry doesn't seem to do
> anything when checking status.
> 
> For example, we have a MySQL service defined with an IP address, a shared
> SAN partition, and the /etc/init.d/mysqld script. 
> 
> The service starts up and shuts down fine when done manually via clusvcadm,
> but if I kill the mysql daemon with the script or manually, the clurgmgrd
> doesn't seem to care.  It just runs its status check, which does report it
> as "stopped" without ever restarting the service.

Just wanted to followup and say that I've solved the status check problem,
sort of.  I decided to play with the exit value of various init scripts to
see what, if any, effect they would have on clurgmgrd, and managed to get
something cobbled together that works.  Its not the best best solution, but
it should do.

To get these scripts to work, its important that the "status" and "stop"
return values that clurgmgrd can deal with.

If status returns a non-zero value (i.e. an error) then clurgmgrd will
think the service has failed and attempt to restart it.  It does this by
issuing a "stop" command, taking down the other resources associated with
it, and then bringing them all back up.

Its the stop command that can cause some problems.  For examample, in my
service defined above, I have the service "MySQL", which has the IP
address, shared storage and the /etc/init.d/mysqld script.  I start it up
using "clusvcadm -e MySQL" and all is well, it brings everything up in the
correct order and MySQL is running fine.  Every 30 seconds, I see it
running a "status" check in the syslog.  So far so good.

Now, I have modified the mysqld script to return the value of "status" from
/etc/init.d/functions as its exit code.  So, when everything is running
fine, it returns "0" and clurgmgrd is happy.  If I do a "killall -9
mysqld_safe mysqld" status will now return a value of 2, which is an error.
clurgmgrd will attempt to restart it by issuing the "stop" command to the
script.  This is where we run into problems.

Since the service is already dead, the startup scripts return an error when
trying to stop the service.  clurgmgrd fails the service and the service is
now down.

The only way I've found around this is to force the "stop" to return 0 no
matter what.  This way clurgmgrd will believe it has succeeded in shutting
down the service and will restart it.

My reasoning is that it is better to have it fail starting it up than to
have it fail stopping a service that is already dead.  I'm sure there are
other problems with this method, but I haven't identified them yet.

> Also, I've seen clurgmgrd die without logging anything anywhere.  I'll just
> check the cluster and it won't be running.  All of the services stay
> running, but the manager is dead.  Restarting it is problematic since it
> will restart each of the services causing a brief interruption.
> 
> Anyone have any ideas on how to solve either of these two problems?  I've
> been waiting to deploy the cluster we've put together until I could resolve
> these two issues, but have run out of things to try.

I'm still seeing clurgmgrd die periodically for no reason, though.  I may
have to write another script to monitor it as well and run that out of cron
every so often.  That doesn't seem like a very good solution, though since
it does restart all of the services that are running on that node.

 - Marc

-- 
Marc Lewis
Blarg! Online Services, Inc.



From omer at faruk.net  Tue Feb 14 07:53:40 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Tue, 14 Feb 2006 09:53:40 +0200 (EET)
Subject: [Linux-cluster] service restart problem
In-Reply-To: <20060214004003.GE23750@opus.corp.blarg.net>
References: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>
	<20060213195619.GA23750@opus.corp.blarg.net>
	<20060214004003.GE23750@opus.corp.blarg.net>
Message-ID: <63321.193.140.74.2.1139903620.squirrel@193.140.74.2>

Isn't it peculiar that it hasn't realised before. This problem seems to
exist since the release of rhel 4.2 (as far as I have realised) or maybe
earlier..

I hope this issue gets solved as soon as possible. By the way thanks for
the workaround Marc..



> On Mon, Feb 13, 2006 at 11:56:20AM -0800, Marc Lewis wrote:
>> On Sat, Feb 11, 2006 at 11:57:32PM +0200, Omer Faruk Sen wrote:
>> >
>> > Hi,
>> >
>> > I have a problem with redhat cluster suite. I have a two node test
>> > cluster. cluster master is cluster2 which is running vsftpd, mysql as
>> > service. I have manually edited vsftpd.conf so it can't start on
>> cluster2.
>> > Then I killed vsftpd process. But after that cluster didn't failover
>> to
>> > cluster1 or after a few seconds I have corrected vsftpd.conf but
>> cluster2
>> > didn't start this service. What I want to ask that is redhat cluster
>> > doesn't support service status check so it can restart failed service
>> or
>> > does it support fail over if one resource doesn't work?
>> >
>> > Best regards,
>>
>> I'm seeing similar issues here.  The script entry doesn't seem to do
>> anything when checking status.
>>
>> For example, we have a MySQL service defined with an IP address, a
>> shared
>> SAN partition, and the /etc/init.d/mysqld script.
>>
>> The service starts up and shuts down fine when done manually via
>> clusvcadm,
>> but if I kill the mysql daemon with the script or manually, the
>> clurgmgrd
>> doesn't seem to care.  It just runs its status check, which does report
>> it
>> as "stopped" without ever restarting the service.
>
> Just wanted to followup and say that I've solved the status check problem,
> sort of.  I decided to play with the exit value of various init scripts to
> see what, if any, effect they would have on clurgmgrd, and managed to get
> something cobbled together that works.  Its not the best best solution,
> but
> it should do.
>
> To get these scripts to work, its important that the "status" and "stop"
> return values that clurgmgrd can deal with.
>
> If status returns a non-zero value (i.e. an error) then clurgmgrd will
> think the service has failed and attempt to restart it.  It does this by
> issuing a "stop" command, taking down the other resources associated with
> it, and then bringing them all back up.
>
> Its the stop command that can cause some problems.  For examample, in my
> service defined above, I have the service "MySQL", which has the IP
> address, shared storage and the /etc/init.d/mysqld script.  I start it up
> using "clusvcadm -e MySQL" and all is well, it brings everything up in the
> correct order and MySQL is running fine.  Every 30 seconds, I see it
> running a "status" check in the syslog.  So far so good.
>
> Now, I have modified the mysqld script to return the value of "status"
> from
> /etc/init.d/functions as its exit code.  So, when everything is running
> fine, it returns "0" and clurgmgrd is happy.  If I do a "killall -9
> mysqld_safe mysqld" status will now return a value of 2, which is an
> error.
> clurgmgrd will attempt to restart it by issuing the "stop" command to the
> script.  This is where we run into problems.
>
> Since the service is already dead, the startup scripts return an error
> when
> trying to stop the service.  clurgmgrd fails the service and the service
> is
> now down.
>
> The only way I've found around this is to force the "stop" to return 0 no
> matter what.  This way clurgmgrd will believe it has succeeded in shutting
> down the service and will restart it.
>
> My reasoning is that it is better to have it fail starting it up than to
> have it fail stopping a service that is already dead.  I'm sure there are
> other problems with this method, but I haven't identified them yet.
>
>> Also, I've seen clurgmgrd die without logging anything anywhere.  I'll
>> just
>> check the cluster and it won't be running.  All of the services stay
>> running, but the manager is dead.  Restarting it is problematic since it
>> will restart each of the services causing a brief interruption.
>>
>> Anyone have any ideas on how to solve either of these two problems?
>> I've
>> been waiting to deploy the cluster we've put together until I could
>> resolve
>> these two issues, but have run out of things to try.
>
> I'm still seeing clurgmgrd die periodically for no reason, though.  I may
> have to write another script to monitor it as well and run that out of
> cron
> every so often.  That doesn't seem like a very good solution, though since
> it does restart all of the services that are running on that node.
>
>  - Marc
>
> --
> Marc Lewis
> Blarg! Online Services, Inc.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Omer Faruk Sen
http://www.faruk.net



From RMoody at mweb.com  Tue Feb 14 12:37:04 2006
From: RMoody at mweb.com (Robert Moody - MWEB)
Date: Tue, 14 Feb 2006 14:37:04 +0200
Subject: [Linux-cluster] Troubles mounting GFS with a cluster.
Message-ID: <6586D1F97DDEDE408BEEF44402F37978084429@mwmx4.mweb.com>

Hi all,

I am running Red Hat Enterprise Linux ES release 4 (Nahant Update 2). 

I have configured the cluster and created a GFS filesystem. I can mount the GFS via the command line however when I try to get the cluster to mount it it returns the following.

Feb 14 12:25:57 dev-mailhost01 kernel: GFS: fsid=cluster:burp.0: jid=7: Trying to acquire journal lock...
Feb 14 12:25:57 dev-mailhost01 kernel: GFS: fsid=cluster:burp.0: jid=7: Looking at journal...
Feb 14 12:25:57 dev-mailhost01 kernel: GFS: fsid=cluster:burp.0: jid=7: Done
Feb 14 12:25:57 dev-mailhost01 kernel: SELinux: initialized (dev emcpowera1, type gfs), not configured for labeling
Feb 14 12:25:57 dev-mailhost01 kernel: audit(1139912757.742:5): avc:  denied  { mount } for  pid=4286 comm="mount" name="/" dev=emcpowera1 ino=26 scontext=user_u:system_r:initrc_t tcontext=system_u:object_r:unlabeled_t tclass=filesystem
Feb 14 12:25:57 dev-mailhost01 clurgmgrd[3003]: <notice> start on clusterfs "burp" returned 2 (invalid argument(s))
Feb 14 12:25:57 dev-mailhost01 clurgmgrd[3003]: <warning> #68: Failed to start filesystem; return value: 1
Feb 14 12:25:57 dev-mailhost01 clurgmgrd[3003]: <notice> Stopping service filesystem
Feb 14 12:25:57 dev-mailhost01 clurgmgrd[3003]: <notice> Service filesystem is recovering
Feb 14 12:25:57 dev-mailhost01 clurgmgrd[3003]: <warning> #71: Relocating failed service filesystem
Feb 14 12:25:57 dev-mailhost01 clurgmgrd[3003]: <notice> Stopping service filesystem
Feb 14 12:25:58 dev-mailhost01 clurgmgrd[3003]: <notice> Service filesystem is stopped


It looks like it is passing an argument to the GFS which it does not like. 

Any ideas as to why this could be happening?

Any and all help as to where to look will be appreciated.

Thanks,

Robert Moody.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060214/2318fa9c/attachment.htm>

From Frank.Weyns at ordina.nl  Tue Feb 14 12:33:46 2006
From: Frank.Weyns at ordina.nl (Weyns, Frank)
Date: Tue, 14 Feb 2006 13:33:46 +0100
Subject: [Linux-cluster] Oracle on a two node cluster: which filesystem ?
Message-ID: <4D30FCF33FE1FC4DB79C18A73D46C67308599A@BA12-0013.work.local>

Hi,
 
I'm building a two node HA oracle cluster on Redhat. (One active production node and a stand-by node with a test application, which can be shutdown if needed. local system disks, stroage on SAN)
 
The question is:
Which underlying Filesytem to use ?
We would prefer to use LVM with ext3 (stable , resize if needed, experience and knowledge in house)
 
But everybody seems to use GFS: 
Why ?
 What are the advantages to LVM-ext3 ?
and
If we use GFS, what are the things to keep in mind ? 
 
/Frank Weyns
--
 
Disclaimer

Dit bericht met eventuele bijlagen is vertrouwelijk en uitsluitend bestemd voor de geadresseerde. Indien u niet de bedoelde ontvanger bent, wordt u verzocht de afzender te waarschuwen en dit bericht met eventuele bijlagen direct te verwijderen en/of te vernietigen. Het is niet toegestaan dit bericht en eventuele bijlagen te vermenigvuldigen, door te sturen, openbaar te maken, op te slaan of op andere wijze te gebruiken. Ordina N.V. en/of haar groepsmaatschappijen accepteren geen verantwoordelijkheid of aansprakelijkheid voor schade die voortvloeit uit de inhoud en/of de verzending van dit bericht.

This e-mail and any attachments are confidential and is solely intended for the addressee only. If you are not the intended recipient, please notify the sender and delete and/or destroy this message and any attachments immediately. It is prohibited to copy, to distribute, to disclose or to use this e-mail and any attachments in any other way. Ordina N.V. and/or its group companies do not accept any responsibility nor liability for any damage resulting from the content of and/or the transmission of this message.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060214/9831e314/attachment.htm>

From gforte at leopard.us.udel.edu  Tue Feb 14 13:02:37 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Tue, 14 Feb 2006 08:02:37 -0500
Subject: [Linux-cluster] Oracle on a two node cluster: which filesystem ?
In-Reply-To: <4D30FCF33FE1FC4DB79C18A73D46C67308599A@BA12-0013.work.local>
References: <4D30FCF33FE1FC4DB79C18A73D46C67308599A@BA12-0013.work.local>
Message-ID: <43F1D4ED.7090201@leopard.us.udel.edu>

I have this exact same setup, and concluded that GFS would be less
trouble overall (though that conclusion is still open to re-evaluation
;-)  The problem, as I see it, is that if have your "shared" data on an
ext3 filesystem, then you want to make damn sure that the "failed" node
has unmounted (or simply ceased accessing) that filesystem before the
"backup" node mounts it and takes over the services.  If you have
hardware fencing then this is theoretically not a big problem, because
the failed node should get power cycled - but that's a BIG "should", and
there could be instances in which you want to switch the services over
without power cycling the "failed" node.  Forcing an unmount of an ext3
filesystem is dicey at best, and I just envisioned too many scenarios in
which both nodes ended up having the filesystem mounted simultaneously
and stomped all over the data.

So, the short version is: GFS is designed for this sort of thing, ext3
isn't.  BTW, LVM and GFS certainly aren't mutually exclusive.  I'm using
GFS/LVM with my SAN because the 'optimal' setup with the unit is to have
two separate raid5 stripes (because of the reserved area for the SAN OS
on some, but not all, of the disks), but I still wanted all of the
available space in one big chunk.  As to resizing a GFS filesystem:

http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/s1-manage-growfs.html

-g

Weyns, Frank wrote:
> Hi,
>  
> I'm building a two node HA oracle cluster on Redhat. (One active
> production node and a stand-by node with a test application, which can
> be shutdown if needed. local system disks, stroage on SAN)
>  
> The question is:
> Which underlying Filesytem to use ?
> We would prefer to use LVM with ext3 (stable , resize if needed,
> experience and knowledge in house)
>  
> But everybody seems to use GFS:
> Why ?
>  What are the advantages to LVM-ext3 ?
> and
> If we use GFS, what are the things to keep in mind ? 
>  
> /Frank Weyns
> --
>  
> Disclaimer
> 
> Dit bericht met eventuele bijlagen is vertrouwelijk en uitsluitend
> bestemd voor de geadresseerde. Indien u niet de bedoelde ontvanger bent,
> wordt u verzocht de afzender te waarschuwen en dit bericht met eventuele
> bijlagen direct te verwijderen en/of te vernietigen. Het is niet
> toegestaan dit bericht en eventuele bijlagen te vermenigvuldigen, door
> te sturen, openbaar te maken, op te slaan of op andere wijze te
> gebruiken. Ordina N.V. en/of haar groepsmaatschappijen accepteren geen
> verantwoordelijkheid of aansprakelijkheid voor schade die voortvloeit
> uit de inhoud en/of de verzending van dit bericht.
> 
> This e-mail and any attachments are confidential and is solely intended
> for the addressee only. If you are not the intended recipient, please
> notify the sender and delete and/or destroy this message and any
> attachments immediately. It is prohibited to copy, to distribute, to
> disclose or to use this e-mail and any attachments in any other way.
> Ordina N.V. and/or its group companies do not accept any responsibility
> nor liability for any damage resulting from the content of and/or the
> transmission of this message.
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From basv at sara.nl  Tue Feb 14 15:56:28 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 14 Feb 2006 16:56:28 +0100
Subject: [Linux-cluster] Question about gfs_quota
Message-ID: <43F1FDAC.1080500@sara.nl>

On our cluster we have enabled quota and we export the GFS-filesystems 
as NFS so clients can mount them. On our login nodes we want to display 
the user disk usage if a user login.

This does not work because rpc.quotad knows nothing about the 
GFS-filesystem. Is there a method to display the user disk usage for 
non-GFS clients?

Thanks
-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************



From lhh at redhat.com  Tue Feb 14 18:47:00 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 14 Feb 2006 13:47:00 -0500
Subject: [Linux-cluster] service restart problem
In-Reply-To: <20060213195619.GA23750@opus.corp.blarg.net>
References: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>
	<20060213195619.GA23750@opus.corp.blarg.net>
Message-ID: <1139942820.5138.107.camel@ayanami.boston.redhat.com>

On Mon, 2006-02-13 at 11:56 -0800, Marc Lewis wrote:

> I'm seeing similar issues here.  The script entry doesn't seem to do
> anything when checking status.
> 
> For example, we have a MySQL service defined with an IP address, a shared
> SAN partition, and the /etc/init.d/mysqld script. 
> 
> The service starts up and shuts down fine when done manually via clusvcadm,
> but if I kill the mysql daemon with the script or manually, the clurgmgrd
> doesn't seem to care.  It just runs its status check, which does report it
> as "stopped" without ever restarting the service.

This should work.  The status check in the mysqld script needs to return
an error (i.e. non-zero) if the 'status' command is run and the service
is not running.

> Also, I've seen clurgmgrd die without logging anything anywhere.  

Probably a segfault which was fixed in CVS about two weeks ago...  It
was the cause of several problems, actually.


> I'll just
> check the cluster and it won't be running.  All of the services stay
> running, but the manager is dead.  Restarting it is problematic since it
> will restart each of the services causing a brief interruption.


> Anyone have any ideas on how to solve either of these two problems?  I've
> been waiting to deploy the cluster we've put together until I could resolve
> these two issues, but have run out of things to try.

If it's not the above, could you provide more information?  Ex. is the
service being reported as "stopped" in "clustat" output?

In the meantime, I will look at the script resource and see if there is
anything obvious.

-- Lon




From lhh at redhat.com  Tue Feb 14 18:58:36 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 14 Feb 2006 13:58:36 -0500
Subject: [Linux-cluster] service restart problem
In-Reply-To: <20060214004003.GE23750@opus.corp.blarg.net>
References: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>
	<20060213195619.GA23750@opus.corp.blarg.net>
	<20060214004003.GE23750@opus.corp.blarg.net>
Message-ID: <1139943516.5138.119.camel@ayanami.boston.redhat.com>

On Mon, 2006-02-13 at 16:40 -0800, Marc Lewis wrote:

> Since the service is already dead, the startup scripts return an error when
> trying to stop the service.  clurgmgrd fails the service and the service is
> now down.

The script should return "0" in stop-after-stop/stop-after-dead cases,
not "1".  It's a bug in /etc/init.d/functions of RHEL4's initscripts
package.


> The only way I've found around this is to force the "stop" to return 0 no
> matter what.  This way clurgmgrd will believe it has succeeded in shutting
> down the service and will restart it.

Please see this bugzilla:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151104

There's a patch for /etc/init.d/functions inside it.


> > Anyone have any ideas on how to solve either of these two problems?  I've
> > been waiting to deploy the cluster we've put together until I could resolve
> > these two issues, but have run out of things to try.
> 
> I'm still seeing clurgmgrd die periodically for no reason, though.  I may
> have to write another script to monitor it as well and run that out of cron
> every so often.  That doesn't seem like a very good solution, though since
> it does restart all of the services that are running on that node.

The correct solution is to fix the problem with clurgmgrd, which has
been done in CVS and will be in an errata shortly.

-- Lon




From lhh at redhat.com  Tue Feb 14 19:04:49 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 14 Feb 2006 14:04:49 -0500
Subject: [Linux-cluster] Oracle on a two node cluster: which filesystem ?
In-Reply-To: <4D30FCF33FE1FC4DB79C18A73D46C67308599A@BA12-0013.work.local>
References: <4D30FCF33FE1FC4DB79C18A73D46C67308599A@BA12-0013.work.local>
Message-ID: <1139943889.5138.121.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-14 at 13:33 +0100, Weyns, Frank wrote:
> Hi,
>  
> I'm building a two node HA oracle cluster on Redhat. (One active
> production node and a stand-by node with a test application, which can
> be shutdown if needed. local system disks, stroage on SAN)
>  
> The question is:
> Which underlying Filesytem to use ?
> We would prefer to use LVM with ext3 (stable , resize if needed,
> experience and knowledge in house)

ext3 in this case, unless you're trying to do 2-node RAC.

Please toss your oracle init script to this mailing list too ;)

-- Lon





From lhh at redhat.com  Tue Feb 14 19:14:03 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 14 Feb 2006 14:14:03 -0500
Subject: [Linux-cluster] Oracle on a two node cluster: which filesystem ?
In-Reply-To: <43F1D4ED.7090201@leopard.us.udel.edu>
References: <4D30FCF33FE1FC4DB79C18A73D46C67308599A@BA12-0013.work.local>
	<43F1D4ED.7090201@leopard.us.udel.edu>
Message-ID: <1139944443.5138.132.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-14 at 08:02 -0500, Greg Forte wrote:
> I have this exact same setup, and concluded that GFS would be less
> trouble overall (though that conclusion is still open to re-evaluation
> ;-)  The problem, as I see it, is that if have your "shared" data on an
> ext3 filesystem, then you want to make damn sure that the "failed" node
> has unmounted (or simply ceased accessing) that filesystem before the
> "backup" node mounts it and takes over the services. 

Fencing takes care of this...


>  If you have
> hardware fencing then this is theoretically not a big problem, because
> the failed node should get power cycled - but that's a BIG "should", and
> there could be instances in which you want to switch the services over
> without power cycling the "failed" node.

... and is plainly stated as a requirement on all Red Hat Cluster Suite
4 and Red Hat GFS 6.1 installations:

http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/ch-hardware.html#S1-HARDWARE-CHOOSING
http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/s1-sysreq-iofence.html


> Forcing an unmount of an ext3
> filesystem is dicey at best, and I just envisioned too many scenarios in
> which both nodes ended up having the filesystem mounted simultaneously
> and stomped all over the data.

GFS lets you mount the file system on multiple nodes, but in the case of
a network partition or live-hang where both nodes have the file system
mounted, the GFS volume can easily become corrupt if fencing is not
employed.

That said, GFS makes it "nicer" and "quicker" to fail over: by the time
rgmanager gets the "node-transition" event, fencing has already
completed and GFS has already replayed the journal.  With ext3, the
journal is replayed during the mount on the new node.

-- Lon



From lhh at redhat.com  Tue Feb 14 19:46:38 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 14 Feb 2006 14:46:38 -0500
Subject: [Linux-cluster] Redhat web server cluster with 4 web server's
	& 2 Database server
In-Reply-To: <BAY111-F1143DB5985482F03EDE5BB4040@phx.gbl>
References: <BAY111-F1143DB5985482F03EDE5BB4040@phx.gbl>
Message-ID: <1139946398.5138.139.camel@ayanami.boston.redhat.com>

On Sun, 2006-02-12 at 15:54 +0530, kishore chowdary wrote:

> Most of the content served will be static content all these server r in a 
> good datacenter & r interconected through gigabit ethernet.

Static content is good, and allows the the web servers to perform at
fairly high capacity.

> We have licensed version of Redhat AS 4.2 for all the server's & planning to 
> buy Redhat Cluster Suite from redhat.

> so i need some guidance from u guys on how to setup this cluster,which file 
> system to use.if u have step by step installation notes of starting from 
> redhat OS installation to setting up cluster with mysql redundancy.
> 
> Do i need any hardware apart from this to set up a cluster.

A Dell/EMC Clariion AX100 and some sort of I/O fencing for the database
servers would be nice.  I like WTI switches, but other people like the
APC 7900/7920/7921.

If you're not doing RR DNS or you don't have a load balancer out front,
you might want to grab something to act as an IPVS director.

-- Lon



From lhh at redhat.com  Tue Feb 14 22:14:53 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 14 Feb 2006 17:14:53 -0500
Subject: [Linux-cluster] Fencing agents on DELL PowerEdge servers
In-Reply-To: <1139605850.11236.6.camel@adingman.cin.cook>
References: <43EB5D61.8010008@webbertek.com.br>
	<1139605850.11236.6.camel@adingman.cin.cook>
Message-ID: <1139955293.5138.152.camel@ayanami.boston.redhat.com>

On Fri, 2006-02-10 at 16:10 -0500, Andrew C. Dingman wrote:
> I'm not entirely sure I know which cluster suite versions you mean, but
> using the cluster suite sold for with RHEL 3 all the fencing devices
> were Perl scripts. 

GFS for RHEL3 used perl scripts.

RHCS for RHEL3 used binary modules.

> I managed to write a couple of my own using the
> examples, one a modified agent for APC 7901 power strips and one for
> DRAM/MC devices in blad chassis. The DRAC/MC one is vastly simpler, and
> it seems that one for a regular drac would be even a bit easier. Cluster
> Suite in RHEL 4 supports DRACs out of the box in the GUI, though I
> haven't tested it.

For GFS on RHEL3 and RHEL4 (and RHCS on RHEL4), the fencing agents are
interchangeable.

-- Lon




From omer at faruk.net  Wed Feb 15 08:49:21 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Wed, 15 Feb 2006 10:49:21 +0200 (EET)
Subject: [Linux-cluster] service restart problem
In-Reply-To: <1139943516.5138.119.camel@ayanami.boston.redhat.com>
References: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>
	<20060213195619.GA23750@opus.corp.blarg.net>
	<20060214004003.GE23750@opus.corp.blarg.net>
	<1139943516.5138.119.camel@ayanami.boston.redhat.com>
Message-ID: <62546.193.140.74.2.1139993361.squirrel@193.140.74.2>



Patch in the URL
https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=111998 doesn't work
with mysqld init script it seems that all scripts should be revised if it
is working or not with this patch. Httpd seems to work with this patch
though.. (returning 0 even if service is not working..)


> On Mon, 2006-02-13 at 16:40 -0800, Marc Lewis wrote:
>
>> Since the service is already dead, the startup scripts return an error
>> when
>> trying to stop the service.  clurgmgrd fails the service and the service
>> is
>> now down.
>
> The script should return "0" in stop-after-stop/stop-after-dead cases,
> not "1".  It's a bug in /etc/init.d/functions of RHEL4's initscripts
> package.
>
>
>> The only way I've found around this is to force the "stop" to return 0
>> no
>> matter what.  This way clurgmgrd will believe it has succeeded in
>> shutting
>> down the service and will restart it.
>
> Please see this bugzilla:
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151104
>
> There's a patch for /etc/init.d/functions inside it.
>
>
>> > Anyone have any ideas on how to solve either of these two problems?
>> I've
>> > been waiting to deploy the cluster we've put together until I could
>> resolve
>> > these two issues, but have run out of things to try.
>>
>> I'm still seeing clurgmgrd die periodically for no reason, though.  I
>> may
>> have to write another script to monitor it as well and run that out of
>> cron
>> every so often.  That doesn't seem like a very good solution, though
>> since
>> it does restart all of the services that are running on that node.
>
> The correct solution is to fix the problem with clurgmgrd, which has
> been done in CVS and will be in an errata shortly.
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Omer Faruk Sen
http://www.faruk.net



From adingman at cookgroup.com  Wed Feb 15 13:22:51 2006
From: adingman at cookgroup.com (Andrew C. Dingman)
Date: Wed, 15 Feb 2006 08:22:51 -0500
Subject: [Linux-cluster] Fencing agents on DELL PowerEdge servers
In-Reply-To: <1139955293.5138.152.camel@ayanami.boston.redhat.com>
References: <43EB5D61.8010008@webbertek.com.br>
	<1139605850.11236.6.camel@adingman.cin.cook>
	<1139955293.5138.152.camel@ayanami.boston.redhat.com>
Message-ID: <1140009771.7668.47.camel@adingman.cin.cook>

OK, got it. My clusters use GFS and the gulm bridge, so I'd missed the
distinction.

On Tue, 2006-02-14 at 17:14 -0500, Lon Hohberger wrote:
> On Fri, 2006-02-10 at 16:10 -0500, Andrew C. Dingman wrote:
> > I'm not entirely sure I know which cluster suite versions you mean, but
> > using the cluster suite sold for with RHEL 3 all the fencing devices
> > were Perl scripts. 
> 
> GFS for RHEL3 used perl scripts.
> 
> RHCS for RHEL3 used binary modules.
> 
> > I managed to write a couple of my own using the
> > examples, one a modified agent for APC 7901 power strips and one for
> > DRAM/MC devices in blad chassis. The DRAC/MC one is vastly simpler, and
> > it seems that one for a regular drac would be even a bit easier. Cluster
> > Suite in RHEL 4 supports DRACs out of the box in the GUI, though I
> > haven't tested it.
> 
> For GFS on RHEL3 and RHEL4 (and RHCS on RHEL4), the fencing agents are
> interchangeable.
> 
> -- Lon
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-- 
Andrew C. Dingman
Unix Administrator
Cook Incorporated
(812)339-2235 x2131
adingman at cookgroup.com



From Anthony.Assi at irisa.fr  Wed Feb 15 14:37:30 2006
From: Anthony.Assi at irisa.fr (Anthony Assi)
Date: Wed, 15 Feb 2006 15:37:30 +0100
Subject: [Linux-cluster] Pam authentification
Message-ID: <43F33CAA.5020806@irisa.fr>

Hi,

i need to restrict access to some nodes of the cluster to all users 
except Root and me,

i tried the following Pam_access method

**in /etc/security/access.conf :
+:root myusername:ALL
-:ALL:ALL


in /etc/pam.d/sshd or /etc/pam.d/login:
account  required  pam_access.so

and then did a:  /etc/init.d/sshd reload



My users are defined in OpenLdap,
this method doesn't seems to work, am i missing something?
Any Other suggestions?



From gabe at msi.umn.edu  Wed Feb 15 14:53:29 2006
From: gabe at msi.umn.edu (Gabe Turner)
Date: Wed, 15 Feb 2006 08:53:29 -0600
Subject: [Linux-cluster] Pam authentification
In-Reply-To: <43F33CAA.5020806@irisa.fr>
References: <43F33CAA.5020806@irisa.fr>
Message-ID: <20060215145329.GB11687@blackice.msi.umn.edu>

On Wed, Feb 15, 2006 at 03:37:30PM +0100, Anthony Assi wrote:
> Hi,
> 
> i need to restrict access to some nodes of the cluster to all users 
> except Root and me,
> 
> i tried the following Pam_access method
> 
> **in /etc/security/access.conf :
> +:root myusername:ALL
> -:ALL:ALL
> 
> 
> in /etc/pam.d/sshd or /etc/pam.d/login:
> account  required  pam_access.so
> 
> and then did a:  /etc/init.d/sshd reload

I use access.conf, but not in as strict a way as above (typically, I want
to allow most users, but only allow root from one or two hosts).  However,
I've never had issues with it working.  You'll also probably want to add
LOCAL to the host field of the allow rule so that cron will be able to su.

Alternatively, until you can get access.conf working, you could use
AllowUsers and AllowGroups in sshd_config:

AllowUsers	root myusername
AllowGroups 	root mygroup

-- 
Gabe Turner                                             gabe at msi.umn.edu
UNIX Systems Administrator,
University of Minnesota Supercomputing Institute
 for Digital Simulation and Advanced Computation         www.msi.umn.edu



From debbiet at arlut.utexas.edu  Wed Feb 15 16:37:57 2006
From: debbiet at arlut.utexas.edu (Debbie Tropiano)
Date: Wed, 15 Feb 2006 10:37:57 -0600
Subject: [Linux-cluster] Pam authentification
In-Reply-To: <43F33CAA.5020806@irisa.fr>
References: <43F33CAA.5020806@irisa.fr>
Message-ID: <20060215163757.GB17971@arlut.utexas.edu>

I have messed with it very thoroughly,
but my /etc/security/access.conf file has

-:ALL EXCEPT root:ALL

I think I may have also tried it with a username after root,
but don't recall now.  I didn't need to restart sshd after
I made the changes to the files (/etc/security/access.conf,
/etc/pam.d/ssh and /etc/pam.d/login.

FWIW  I found the example on this page
http://www.phptr.com/articles/article.asp?p=165226&seqNum=12
very helpful.

Good luck,
Debbie

On Wed, Feb 15, 2006 at 03:37:30PM +0100, Anthony Assi wrote:
> i need to restrict access to some nodes of the cluster to all users 
> except Root and me,
> 
> i tried the following Pam_access method
> 
> **in /etc/security/access.conf :
> +:root myusername:ALL
> -:ALL:ALL
> 
> 
> in /etc/pam.d/sshd or /etc/pam.d/login:
> account  required  pam_access.so
>... 

-- 
| Debbie Tropiano                            |  debbiet at arlut.utexas.edu   |
| Environmental Sciences Laboratory          |     +1 512 835 3367 w       |
| Applied Research Laboratories of UT Austin |     +1 512 835 3544 fax     |
| P.O. Box 8029, Austin, TX 78713-8029       | home email: debbie at icus.com |



From debbiet at arlut.utexas.edu  Wed Feb 15 16:39:07 2006
From: debbiet at arlut.utexas.edu (Debbie Tropiano)
Date: Wed, 15 Feb 2006 10:39:07 -0600
Subject: [Linux-cluster] Pam authentification
In-Reply-To: <20060215163757.GB17971@arlut.utexas.edu>
References: <43F33CAA.5020806@irisa.fr>
	<20060215163757.GB17971@arlut.utexas.edu>
Message-ID: <20060215163907.GC17971@arlut.utexas.edu>

That should read "I HAVEN'T messed with it very thoroughly, ..."

Sorry

On Wed, Feb 15, 2006 at 10:37:57AM -0600, Debbie Tropiano wrote:
> I have messed with it very thoroughly,
> but my /etc/security/access.conf file has
> 
> -:ALL EXCEPT root:ALL
> 
> I think I may have also tried it with a username after root,
> but don't recall now.  I didn't need to restart sshd after
> I made the changes to the files (/etc/security/access.conf,
> /etc/pam.d/ssh and /etc/pam.d/login.
> 
> FWIW  I found the example on this page
> http://www.phptr.com/articles/article.asp?p=165226&seqNum=12
> very helpful.
> 
> Good luck,
> Debbie
> 
> On Wed, Feb 15, 2006 at 03:37:30PM +0100, Anthony Assi wrote:
> > i need to restrict access to some nodes of the cluster to all users 
> > except Root and me,
> > 
> > i tried the following Pam_access method
> > 
> > **in /etc/security/access.conf :
> > +:root myusername:ALL
> > -:ALL:ALL
> > 
> > 
> > in /etc/pam.d/sshd or /etc/pam.d/login:
> > account  required  pam_access.so
> >... 
> 
> -- 
> | Debbie Tropiano                            |  debbiet at arlut.utexas.edu   |
> | Environmental Sciences Laboratory          |     +1 512 835 3367 w       |
> | Applied Research Laboratories of UT Austin |     +1 512 835 3544 fax     |
> | P.O. Box 8029, Austin, TX 78713-8029       | home email: debbie at icus.com |
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
| Debbie Tropiano                            |  debbiet at arlut.utexas.edu   |
| Environmental Sciences Laboratory          |     +1 512 835 3367 w       |
| Applied Research Laboratories of UT Austin |     +1 512 835 3544 fax     |
| P.O. Box 8029, Austin, TX 78713-8029       | home email: debbie at icus.com |



From ben.yarwood at juno.co.uk  Wed Feb 15 16:51:10 2006
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Wed, 15 Feb 2006 16:51:10 -0000
Subject: [Linux-cluster] Fence agents not working as expected
Message-ID: <004c01c63250$05d9cc50$3964a8c0@WS076>

I am using FC4 updated to the lastest releases-updated version and am having
some problems getting the wti fence agent to work.  I think perhaps I have a
config error.  I have a three node cluster.  When a node needs fencing, the
following messages appear

>From member jrmedia-c

Feb 15 10:54:46 jrmedia-c kernel: CMAN: removing node jrmedia-a from the
cluster : Missed too many heartbeats
Feb 15 10:54:47 jrmedia-c fenced[2069]: fencing deferred to jrmedia-b

>From member Jrmedia-b

Feb 15 10:54:45 jrmedia-b fenced[2055]: jrmedia-a not a cluster member after
0 sec post_fail_delay
Feb 15 10:54:45 jrmedia-b fenced[2055]: fencing node "jrmedia-a"
Feb 15 10:54:45 jrmedia-b fence_manual: Node jrmedia-a needs to be reset
before recovery can procede.  Waiting for jrmedia-a to rejoin the cluster or
for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n
jrmedia-a)

After a "fence_ack_manual -n jrmedia-a" the following message appears.

Feb 15 10:56:38 jrmedia-b fenced[2055]: fence "jrmedia-a" success

There is no mention of trying to use the wti fence agent.  It should be
noted the wti fence device has no password and I am not trying to use the
brocade agents that I have named as devices at the moment. Below is an
extract from my cluster.conf file.

        <clusternodes>

                <clusternode name="jrmedia-a">
                        <fence>
                                <!-- "power" method is tried before all
others -->
                                <method name="power">
                                        <device name="wti" port="16"/>
                                </method>
                                <method name="human">
                                        <device name="last_resort"
ipaddr="jrmedia-a"/>
                                </method>
                        </fence>
                </clusternode>

                <clusternode name="jrmedia-b">
                        <fence>
                                <!-- "power" method is tried before all
others -->
                                <method name="power">
                                        <device name="wti" port="15"/>
                                </method>
                                <method name="human">
                                        <device name="last_resort"
ipaddr="jrmedia-b"/>
                                </method>
                        </fence>
                </clusternode>

                <clusternode name="jrmedia-c">
                        <fence>
                                <!-- "power" method is tried before all
others -->
                                <method name="power">
                                        <device name="wti" port="13"/>
                                </method>
                                <method name="human">
                                        <device name="last_resort"
ipaddr="jrmedia-c"/>
                                </method>
                        </fence>
                </clusternode>

        </clusternodes>

        <fencedevices>

                <!-- The WTI fence device requires no login name -->
                <fencedevice name="ibm_3534_a" agent="fence_brocade"
ipaddr="10.0.1.67" login="admin" passwd="xxxx"/>
                <fencedevice name="ibm_3534_b" agent="fence_brocade"
ipaddr="10.0.1.68" login="admin" passwd="xxxx"/>
                <fencedevice name="wti" agent="fence_wti"
ipaddress="10.0.1.40" passwd=""/>
                <fencedevice name="last_resort" agent="fence_manual"/>

        </fencedevices>


Ben Yarwood
Technical Director
Juno Records
t - 020 7424 2804
m - 07930 922 333
e - ben.yarwood at juno.co.uk 






From leonardo.mello at planejamento.gov.br  Wed Feb 15 18:18:56 2006
From: leonardo.mello at planejamento.gov.br (leonardo rodrigues de mello)
Date: Wed, 15 Feb 2006 16:18:56 -0200
Subject: [Linux-cluster] Project Felix
Message-ID: <1140027536.8416.85.camel@mp>

Does anyone here can explain me what is project felix about ? 

i had saw one message from jparsons at sourceware.org importing this
project into the cluster suite cvs. 


best regards

Leonardo Rodrigues de Mello 
(lmello) 



From lhh at redhat.com  Wed Feb 15 23:14:38 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 15 Feb 2006 18:14:38 -0500
Subject: [Linux-cluster] service restart problem
In-Reply-To: <62546.193.140.74.2.1139993361.squirrel@193.140.74.2>
References: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102>
	<20060213195619.GA23750@opus.corp.blarg.net>
	<20060214004003.GE23750@opus.corp.blarg.net>
	<1139943516.5138.119.camel@ayanami.boston.redhat.com>
	<62546.193.140.74.2.1139993361.squirrel@193.140.74.2>
Message-ID: <1140045278.19355.69.camel@ayanami.boston.redhat.com>

On Wed, 2006-02-15 at 10:49 +0200, Omer Faruk Sen wrote:
> 
> Patch in the URL
> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=111998 doesn't work
> with mysqld init script it seems that all scripts should be revised if it
> is working or not with this patch. Httpd seems to work with this patch
> though.. (returning 0 even if service is not working..)

What mysql n-v-r are you running?

-- Lon



From lhh at redhat.com  Wed Feb 15 23:17:43 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 15 Feb 2006 18:17:43 -0500
Subject: [Linux-cluster] Fence agents not working as expected
In-Reply-To: <004c01c63250$05d9cc50$3964a8c0@WS076>
References: <004c01c63250$05d9cc50$3964a8c0@WS076>
Message-ID: <1140045463.19355.72.camel@ayanami.boston.redhat.com>

On Wed, 2006-02-15 at 16:51 +0000, Ben Yarwood wrote:

> There is no mention of trying to use the wti fence agent.  It should be
> noted the wti fence device has no password 

This could very well be the problem...

[root at red ~]# fence_wti
ipaddr=127.0.0.1
port=1 
failed: no password

> and I am not trying to use the
> brocade agents that I have named as devices at the moment. Below is an
> extract from my cluster.conf file.

First, try configuring a password.

-- Lon




From omer at faruk.net  Thu Feb 16 06:33:28 2006
From: omer at faruk.net (Omer Faruk Sen)
Date: Thu, 16 Feb 2006 08:33:28 +0200 (EET)
Subject: [Linux-cluster] service restart problem
In-Reply-To: <1140045278.19355.69.camel@ayanami.boston.redhat.com>
References: <1451.85.102.208.102.1139695052.squirrel@85.102.208.102> 
	<20060213195619.GA23750@opus.corp.blarg.net> 
	<20060214004003.GE23750@opus.corp.blarg.net> 
	<1139943516.5138.119.camel@ayanami.boston.redhat.com> 
	<62546.193.140.74.2.1139993361.squirrel@193.140.74.2>
	<1140045278.19355.69.camel@ayanami.boston.redhat.com>
Message-ID: <53954.193.140.74.2.1140071608.squirrel@193.140.74.2>

The one that comes with rhel AS 4.2: mysql-server-4.1.12-3.RHEL4.1



> On Wed, 2006-02-15 at 10:49 +0200, Omer Faruk Sen wrote:
>>
>> Patch in the URL
>> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=111998 doesn't
>> work
>> with mysqld init script it seems that all scripts should be revised if
>> it
>> is working or not with this patch. Httpd seems to work with this patch
>> though.. (returning 0 even if service is not working..)
>
> What mysql n-v-r are you running?
>
> -- Lon
>
>


-- 
Omer Faruk Sen
http://www.faruk.net



From fabbione at fabbione.net  Thu Feb 16 12:09:30 2006
From: fabbione at fabbione.net (Fabio Massimo Di Nitto)
Date: Thu, 16 Feb 2006 13:09:30 +0100
Subject: [Linux-cluster] Re: Import
In-Reply-To: <20060215142959.7879.qmail@sourceware.org>
References: <20060215142959.7879.qmail@sourceware.org>
Message-ID: <43F46B7A.4090906@fabbione.net>

jparsons at sourceware.org wrote:
> CVSROOT:	/cvs/cluster
> Module name:	felix
> Changes by:	jparsons at sourceware.org	2006-02-15 14:29:59
> 
> Log message:
>     Genesis of Felix Project
>     
>     Status:

Hey,

thanks for this amazing gift of free software.. but WHAT IS IT? ;)

Thanks
Fabio

PS in attachment a little patch to fix building ;)

-- 
I'm going to make him an offer he can't refuse.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: felix.patch
Type: text/x-patch
Size: 1952 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060216/ba8f6893/attachment.bin>

From lhh at redhat.com  Thu Feb 16 14:46:43 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 16 Feb 2006 09:46:43 -0500
Subject: [Linux-cluster] Oracle on a two node cluster: which filesystem ?
In-Reply-To: <20060216054620.61132.qmail@web52305.mail.yahoo.com>
References: <20060216054620.61132.qmail@web52305.mail.yahoo.com>
Message-ID: <1140101203.19355.96.camel@ayanami.boston.redhat.com>

On Wed, 2006-02-15 at 21:46 -0800, SUVANKAR MOITRA wrote:
> dear lon,
> 
> I want impliment oracle 10g on cluster suite 4. Its a
> active/passive cluster.How can i install oracle on
> cluster.
> Pl help me.
> 
> regards
> 
> Suvankar Moitra
> India, kolkata

It's been awhile since I (personally) have installed Oracle in a cold
failover cluster, but I know it's been done.

There is an oracleas script I sent to linux-cluster awhile back which is
probably a good starting point for build a resource agent to start/stop
Oracle.  It is untested on RHCS4, but was tested with an Oracle "Cold
Failover Cluster" installation on RHEL 2.1 and RHCS3 in some way:

http://www.oracle.com/technology/products/ias/hi_av/10g_904_HA_Certification.html

See here for some older RHCS3 documentation, which mostly still applies
to RHCS4:

http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/cluster-suite/ch-db-service.html#S1-SERVICE-ORACLE

Oracle has extensive documentation (I had it in my hand at one point
some time ago ;) ) on how to install Oracle software in a cold failover
cluster environment.

The main trick is to mount all the file systems and bring up the
expected service IP address *before* running the Oracle installer.
Also, make sure the Oracle user's home directory is not on a shared file
system.

You can install the Oracle software on shared storage if you want to, or
you can install it on each node individually.  However, the
tables/data/undo/etc. must be on shared storage which is accessible by
all nodes.

After that, craft the service, which would look something like:

   <service name="oracle" ...>
     <fs name="oracle_data" mountpoint="/oradata" .../>
     <fs name="oracle_home" mountpoint="/orahome" .../>
     <fs name="oracle_undo" mountpoint="/oraundo" .../>
     <ip address="192.168.1.2" .../>
     <script name="oracle" path="/home/oracle/oracle" .../>
   </service>

Good luck!

-- Lon



From ben.yarwood at juno.co.uk  Thu Feb 16 14:49:27 2006
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Thu, 16 Feb 2006 14:49:27 -0000
Subject: [Linux-cluster] Fence agents not working as expected
In-Reply-To: <1140045463.19355.72.camel@ayanami.boston.redhat.com>
Message-ID: <00d701c63308$2f796e40$3964a8c0@WS076>

Ok progress, but still not working

Feb 16 10:34:33 jrmedia-b kernel: CMAN: node jrmedia-a has been removed from
the cluster : Missed too many heartbeats
Feb 16 10:34:34 jrmedia-b fenced[2055]: jrmedia-a not a cluster member after
0 sec post_fail_delay
Feb 16 10:34:34 jrmedia-b fenced[2055]: fencing node "jrmedia-a"
Feb 16 10:34:37 jrmedia-b fenced[2055]: agent "fence_wti" reports: parse
error: unknown option "ipaddress=10.0.1.40" 

I changed ipaddress= to ipaddr= and now I get the following 

Feb 16 14:44:20 jrmedia-a fenced[2110]: agent "fence_wti" reports: failed:
no operation specified 


Ben



Ben Yarwood
Technical Director
Juno Records
t - 020 7424 2804
m - 07930 922 333
e - ben.yarwood at juno.co.uk 


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: 15 February 2006 23:18
To: linux clustering
Subject: Re: [Linux-cluster] Fence agents not working as expected

On Wed, 2006-02-15 at 16:51 +0000, Ben Yarwood wrote:

> There is no mention of trying to use the wti fence agent.  It should 
> be noted the wti fence device has no password

This could very well be the problem...

[root at red ~]# fence_wti
ipaddr=127.0.0.1
port=1
failed: no password

> and I am not trying to use the
> brocade agents that I have named as devices at the moment. Below is an 
> extract from my cluster.conf file.

First, try configuring a password.

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster





From lhh at redhat.com  Thu Feb 16 15:23:21 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 16 Feb 2006 10:23:21 -0500
Subject: [Linux-cluster] Fence agents not working as expected
In-Reply-To: <00d701c63308$2f796e40$3964a8c0@WS076>
References: <00d701c63308$2f796e40$3964a8c0@WS076>
Message-ID: <1140103401.19355.100.camel@ayanami.boston.redhat.com>

On Thu, 2006-02-16 at 14:49 +0000, Ben Yarwood wrote:
> Ok progress, but still not working
> 
> Feb 16 10:34:33 jrmedia-b kernel: CMAN: node jrmedia-a has been removed from
> the cluster : Missed too many heartbeats
> Feb 16 10:34:34 jrmedia-b fenced[2055]: jrmedia-a not a cluster member after
> 0 sec post_fail_delay
> Feb 16 10:34:34 jrmedia-b fenced[2055]: fencing node "jrmedia-a"
> Feb 16 10:34:37 jrmedia-b fenced[2055]: agent "fence_wti" reports: parse
> error: unknown option "ipaddress=10.0.1.40" 
> 
> I changed ipaddress= to ipaddr= and now I get the following 
> 
> Feb 16 14:44:20 jrmedia-a fenced[2110]: agent "fence_wti" reports: failed:
> no operation specified 

This was fixed awhile ago.  The default was supposed to be "reboot"; and
is fixed in current CVS and ought to have been fixed in RHCS4U2.

Add option="off" for now.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162805
http://rhn.redhat.com/errata/RHBA-2005-737.html

-- Lon



From lhh at redhat.com  Thu Feb 16 18:17:06 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 16 Feb 2006 13:17:06 -0500
Subject: [Linux-cluster] Problem with RHEL 4.0 NFS service
In-Reply-To: <EB190CD1E73E1146ACB7694746E205A801655CA8@hx1.ums.msfc.nasa.gov>
References: <EB190CD1E73E1146ACB7694746E205A801655CA8@hx1.ums.msfc.nasa.gov>
Message-ID: <1140113826.16320.2.camel@ayanami.boston.redhat.com>

On Thu, 2006-02-09 at 11:49 -0600, Dalton, Maurice wrote:
> I am trying to create an RHEL4.0 NFS service. I have a 2 system
> cluster. 
> Cluster.conf file is as below

         <service autostart="1" domain="dpimstst" name="dpims"> 
                 <ip ref="198.1xx.xxx.xxx"/> 
                 <fs ref="dpims"> 
                        <nfsexport ref="disk1"> 
                             <nfsclient ref="export"/> 
                        </nfsexport>
                 </fs> 
         </service> 

Slight change.

HTH

-- Lon





From claudio.tassini at gmail.com  Fri Feb 17 12:58:05 2006
From: claudio.tassini at gmail.com (Claudio Tassini)
Date: Fri, 17 Feb 2006 13:58:05 +0100
Subject: [Linux-cluster] Fencing with Sun servers LOM
Message-ID: <39fdf1c70602170458t46f1330eh@mail.gmail.com>

Hi all,

i noticed that IBM and HP lom chips can be used to fence a cluster node. Sun
Microsystems servers use a proprietary LOM chip that can also be used to
power-cycle the server. Is there any project to write an appropriate fence
agent?

TIA all.

--
Claudio Tassini
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060217/d01695de/attachment.htm>

From brilong at cisco.com  Fri Feb 17 16:45:12 2006
From: brilong at cisco.com (Brian Long)
Date: Fri, 17 Feb 2006 11:45:12 -0500
Subject: [Linux-cluster] May I turn on direct I/O and/or async I/O when
	using GFS as the cluster file system for Oracle9i RAC?
In-Reply-To: <43F0D5F2.9040906@redhat.com>
References: <d4e2d9970602130823y15edf756oc2cfb8f6be5d69a9@mail.gmail.com>
	<43F0B545.3020708@redhat.com>
	<d4e2d9970602130941wbef0c29wfa041841783a7c7f@mail.gmail.com>
	<43F0D5F2.9040906@redhat.com>
Message-ID: <1140194712.4319.1.camel@brilong-lnx>

On Mon, 2006-02-13 at 13:54 -0500, Wendy Cheng wrote:
> Thai Duong wrote:
> 
> > Hi Wendy,
> >
> > Thx for the information. I'll turn on direct I/O on my system. Do you 
> > have any information about how many percent I should gain in term of 
> > I/O performance when turning direct I/O on in general and with Oracle 
> > RAC in particular?
> >
> Our performance group did pass the numbers around sometime ago - the 
> gain seemed to be quite positive but I don't remember the details. On 
> the other hand, do remember this is not a frequent execution code path 
> so try to take the switching steps slowly (i.e. make sure you have 
> backup plan, etc, particularly on GFS 6.0 system).

Wendy,

Any change these performance numbers could be shared with the list?

Thanks.

/Brian/

-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From claudio.tassini at gmail.com  Fri Feb 17 16:57:23 2006
From: claudio.tassini at gmail.com (Claudio Tassini)
Date: Fri, 17 Feb 2006 17:57:23 +0100
Subject: [Linux-cluster] heartbeat network
Message-ID: <39fdf1c70602170857k167e5eefs@mail.gmail.com>

Hi all,
I have a rhcs4 with gfs on a three nodes cluster. The primary network
interfaces are configured for bonding, but if a node becomes unresponsive I
would like to have at least another separated network that the cluster
should use for checking other nodes' status. That is, I need that each node
checks the status of other nodes on the primary network interface (the one
associated with the hostname), AND on another network, private to the
cluster, also configured with bonding.

Is it possible?

--
Claudio Tassini
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060217/e1ed1717/attachment.htm>

From jparsons at redhat.com  Fri Feb 17 19:22:05 2006
From: jparsons at redhat.com (James Parsons)
Date: Fri, 17 Feb 2006 14:22:05 -0500
Subject: [Linux-cluster] re: Project Felix
Message-ID: <43F6225D.2060608@redhat.com>

Thanks for the interest. Documentation on the effort will be checked 
into the repository today. I will mention on this list when it is available.

-Jim



From iga.cnrst at gmail.com  Sat Feb 18 00:46:53 2006
From: iga.cnrst at gmail.com (hicham lotfi simo iga)
Date: Sat, 18 Feb 2006 00:46:53 +0000
Subject: [Linux-cluster] Re: system-cluster-manager
In-Reply-To: <334ba50e0602091358i5eba7372m@mail.gmail.com>
References: <334ba50e0602091358i5eba7372m@mail.gmail.com>
Message-ID: <334ba50e0602171646u3272b714o@mail.gmail.com>

i would ask about this rpm : system-cluster-manager of the Scientific Linux
Distribution if anyone knows how to use it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060218/19a35e8f/attachment.htm>

From iga.cnrst at gmail.com  Sat Feb 18 00:47:34 2006
From: iga.cnrst at gmail.com (hicham lotfi simo iga)
Date: Sat, 18 Feb 2006 00:47:34 +0000
Subject: [Linux-cluster] Configuring cluster with Scientific Linux
In-Reply-To: <334ba50e0602171642j50d199cj@mail.gmail.com>
References: <334ba50e0602171642j50d199cj@mail.gmail.com>
Message-ID: <334ba50e0602171647h1b8366d9n@mail.gmail.com>

hello, i woul'd ask about a free destribition to install i cluster with 5
servers.
i have Scientific Linux but i don't knou how to use it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060218/cf0ee5ec/attachment.htm>

From iga.cnrst at gmail.com  Sat Feb 18 00:42:45 2006
From: iga.cnrst at gmail.com (hicham lotfi simo iga)
Date: Sat, 18 Feb 2006 00:42:45 +0000
Subject: [Linux-cluster] Configuring cluster with Scientific Linux
Message-ID: <334ba50e0602171642j50d199cj@mail.gmail.com>

hello, i woul'd ask about a free destribition to install i cluster with 5
servers.
i have Scientific Linux but i don't knou how to use it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060218/16351e2f/attachment.htm>

From hyclak at math.ohiou.edu  Sat Feb 18 01:43:34 2006
From: hyclak at math.ohiou.edu (Matt Hyclak)
Date: Fri, 17 Feb 2006 20:43:34 -0500
Subject: [Linux-cluster] Re: system-cluster-manager
In-Reply-To: <334ba50e0602171646u3272b714o@mail.gmail.com>
References: <334ba50e0602091358i5eba7372m@mail.gmail.com>
	<334ba50e0602171646u3272b714o@mail.gmail.com>
Message-ID: <20060218014333.GA1532@math.ohiou.edu>

On Sat, Feb 18, 2006 at 12:46:53AM +0000, hicham lotfi simo iga enlightened us:
> i would ask about this rpm : system-cluster-manager of the Scientific Linux
> Distribution if anyone knows how to use it.

http://www.redhat.com/docs/manuals/csgfs/ would be a good start.

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263



From renapte at vsnl.net  Sat Feb 18 15:32:03 2006
From: renapte at vsnl.net (renapte at vsnl.net)
Date: Sat, 18 Feb 2006 20:32:03 +0500
Subject: [Linux-cluster] Synchronous Locking Function for Kernel
Message-ID: <51b5e351b218.51b21851b5e3@vsnl.net>

Hi,
I'd like to know if there is any Synchronous Locking Function in the DLM that can be used in the kernel.
Thanks in advance,
Renuka.



From DYM3FYEHE at aol.com  Sat Feb 18 19:00:44 2006
From: DYM3FYEHE at aol.com (DYM3FYEHE at aol.com)
Date: Sat, 18 Feb 2006 14:00:44 EST
Subject: [Linux-cluster] Service try to start before disk mount and failed
Message-ID: <2bc.5415bc0.3128c8dc@aol.com>

 
 
 
I am using RHCS V4 for Oracle  database and another application running on 
the two node failover cluster  system. The application has different services 
and installed in the shared  storage. The cluster manager starts the services 
before the shared drives  mounted to the second node. I use script to start and 
stop the service with time  delay but still the service start before the 
shared storage moved to the second  node and failed. Is any body knows how I can 
control the storage resource and  the service? I want to make the service to 
wait until the shared storage  available in the second node.

 
thanks,


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060218/d022829f/attachment.htm>

From cboudjnah at squiz.net  Mon Feb 20 08:13:36 2006
From: cboudjnah at squiz.net (Chmouel Boudjnah)
Date: Mon, 20 Feb 2006 19:13:36 +1100
Subject: [Linux-cluster] GFS Crash
Message-ID: <87bqx28n2n.fsf@paris.squiz.net>

Hi,

Using GFS on RHEL4 (kernel 2.6.9-22.0.2.ELsmp) i got a crash with
it. 

Before in the dmesg, there is some OOM-Killer activity with httpd.
It seems to get something to do with the HIGHMEM and the 4GB of RAM.
It looks like the OOM killer keep the pages still dirty until the
physical RAM filled (but i am pretty sure i can get wrong about that,
i don't really know how works the OOM-Killer theses days).

So i am not sure what's trigering the crash, is it because of the OOM
killer or because of GFS.

Here is the crash of GFS :

,----
| Feb 20 02:29:38 www01 kernel: 6810 en punlock 7,4034f6
| Feb 20 02:29:38 www01 kernel: 6810 ex punlock -2
| Feb 20 02:29:38 www01 kernel:
| Feb 20 02:29:38 www01 kernel: lock_dlm:  Assertion failed on line 357 of file /usr/src/build/678343-i686/BUILD/gfs-kernel-2.6.9-45/smp/src/dlm/lock.c
| Feb 20 02:29:38 www01 kernel: lock_dlm:  assertion:  "!error"
| Feb 20 02:29:38 www01 kernel: lock_dlm:  time = 2296932048
| Feb 20 02:29:38 www01 kernel: matrix: error=-22 num=2,6b66f lkf=10000 flags=84
| Feb 20 02:29:38 www01 kernel:
| Feb 20 02:29:38 www01 kernel: ------------[ cut here ]------------
| Feb 20 02:29:38 www01 kernel: kernel BUG at /usr/src/build/678343-i686/BUILD/gfs-kernel-2.6.9-45/smp/src/dlm/lock.c:357!
| Feb 20 02:29:38 www01 kernel: invalid operand: 0000 [#3]
| Feb 20 02:29:38 www01 kernel: SMP
| Feb 20 02:29:38 www01 kernel: Modules linked in: iptable_nat ip_conntrack iptable_filter ip_tables lock_dlm(U) dlm(U) gfs(U) lock_harness(U) cman(U) nls_utf8 loop autofs4 md5 ipv6 microcode dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd hw_random e100 mii tg3 floppy ext3
| jbd cciss sd_mod scsi_mod
| Feb 20 02:29:38 www01 kernel: CPU:    2
| Feb 20 02:29:38 www01 kernel: EIP:    0060:[<f89e65f3>]    Not tainted VLI
| Feb 20 02:29:38 www01 kernel: EFLAGS: 00010246   (2.6.9-22.0.2.ELsmp)
| Feb 20 02:29:38 www01 kernel: EIP is at do_dlm_unlock+0x8b/0xa0 [lock_dlm]
| Feb 20 02:29:38 www01 kernel: eax: 00000001   ebx: f2afea80   ecx: f3c8bf2c   edx: f89eb175
| Feb 20 02:29:38 www01 kernel: esi: ffffffea   edi: f2afea80   ebp: f89bf000   esp: f3c8bf28
| Feb 20 02:29:38 www01 kernel: ds: 007b   es: 007b   ss: 0068
| Feb 20 02:29:38 www01 kernel: Process gfs_glockd (pid: 5950, threadinfo=f3c8b000 task=f65a2eb0)
| Feb 20 02:29:38 www01 kernel: Stack: f89eb175 f89bf000 00000003 f89e6893 f8abc5ee 00000001 d9d85314 d9d852f8
| Feb 20 02:29:38 www01 kernel:        f8ab2892 f8ae7580 cae61980 d9d852f8 f8ae7580 d9d852f8 f8ab1d8b d9d852f8
| Feb 20 02:29:38 www01 kernel:        00000001 d9d8538c f8ab1e42 d9d852f8 d9d85314 f8ab1f65 00000001 d9d85314
| Feb 20 02:29:38 www01 kernel: Call Trace:
| Feb 20 02:29:38 www01 kernel:  [<f89e6893>] lm_dlm_unlock+0x14/0x1c [lock_dlm]
| Feb 20 02:29:38 www01 kernel:  [<f8abc5ee>] gfs_lm_unlock+0x2c/0x42 [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab2892>] gfs_glock_drop_th+0xf3/0x12d [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab1d8b>] rq_demote+0x7f/0x98 [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab1e42>] run_queue+0x5a/0xc1 [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab1f65>] unlock_on_glock+0x1f/0x28 [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab3ec9>] gfs_reclaim_glock+0xc3/0x13c [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8aa6e05>] gfs_glockd+0x39/0xde [gfs]
| Feb 20 02:29:38 www01 kernel:  [<c011e45f>] default_wake_function+0x0/0xc
| Feb 20 02:29:38 www01 kernel:  [<c02d129e>] ret_from_fork+0x6/0x14
| Feb 20 02:29:38 www01 kernel:  [<c011e45f>] default_wake_function+0x0/0xc
| Feb 20 02:29:38 www01 kernel:  [<f8aa6dcc>] gfs_glockd+0x0/0xde [gfs]
| Feb 20 02:29:38 www01 kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
| Feb 20 02:29:38 www01 kernel: Code: 73 34 8b 03 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 ff 70 18 68 6d b2 9e f8 e8 5c bc 73 c7 83 c4 34 68 75 b1 9e f8 e8 4f bc 73 c7 <0f> 0b 65 01 b2 b0 9e f8 68 77 b1 9e f8 e8 0a b4 73 c7 5b 5e c3
`----

And here is one of the OOM-Killer cleanup stuff :

,----
| Feb 20 02:29:36 www01 kernel: cpu 0 hot: low 32, high 96, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 0 cold: low 0, high 32, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 1 hot: low 32, high 96, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 1 cold: low 0, high 32, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 2 hot: low 32, high 96, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 2 cold: low 0, high 32, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 3 hot: low 32, high 96, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 3 cold: low 0, high 32, batch 16
| Feb 20 02:29:36 www01 kernel:
| Feb 20 02:29:36 www01 kernel: Free pages:       75308kB (62400kB HighMem)
| Feb 20 02:29:36 www01 kernel: Active:622325 inactive:354228 dirty:0 writeback:0 unstable:0 free:18827 slab:19992 mapped:976121 pagetables:2922
| Feb 20 02:29:36 www01 kernel: DMA free:12588kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:15087 all_unreclaimable? yes
| Feb 20 02:29:36 www01 kernel: protections[]: 0 0 0
| Feb 20 02:29:36 www01 kernel: Normal free:320kB min:928kB low:1856kB high:2784kB active:508288kB inactive:265708kB present:901120kB pages_scanned:2805165 all_unreclaimable? yes
| Feb 20 02:29:36 www01 kernel: protections[]: 0 0 0
| Feb 20 02:29:36 www01 kernel: HighMem free:62400kB min:512kB low:1024kB high:1536kB active:1980884kB inactive:1151332kB present:3735548kB pages_scanned:0 all_unreclaimable? no
| Feb 20 02:29:36 www01 kernel: protections[]: 0 0 0
| Feb 20 02:29:36 www01 kernel: DMA: 3*4kB 4*8kB 4*16kB 2*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12588kB
| Feb 20 02:29:36 www01 kernel: Normal: 0*4kB 4*8kB 0*16kB 1*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 320kB
| Feb 20 02:29:36 www01 kernel: HighMem: 1152*4kB 1118*8kB 1179*16kB 903*32kB 7*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 62400kB
| Feb 20 02:29:36 www01 kernel: Swap cache: add 528754, delete 527948, find 173/299, race 0+0
| Feb 20 02:29:36 www01 kernel: 0 bounce buffer pages
| Feb 20 02:29:36 www01 kernel: Free swap:            0kB
| Feb 20 02:29:36 www01 kernel: 1163263 pages of RAM
| Feb 20 02:29:36 www01 kernel: 802802 pages of HIGHMEM
| Feb 20 02:29:36 www01 kernel: 141639 reserved pages
| Feb 20 02:29:36 www01 kernel: 22868 pages shared
| Feb 20 02:29:36 www01 kernel: 806 pages swap cached
| Feb 20 02:29:36 www01 kernel: Out of Memory: Killed process 6122 (httpd).
`----

Someone has an idea about it ?

Cheers, Chmouel.

-- 
Chmouel Boudjnah - Squiz.net - http://www.squiz.net/



From pcaulfie at redhat.com  Mon Feb 20 09:03:08 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 20 Feb 2006 09:03:08 +0000
Subject: [Linux-cluster] Synchronous Locking Function for Kernel
In-Reply-To: <51b5e351b218.51b21851b5e3@vsnl.net>
References: <51b5e351b218.51b21851b5e3@vsnl.net>
Message-ID: <43F985CC.6040508@redhat.com>

renapte at vsnl.net wrote:
> Hi,
> I'd like to know if there is any Synchronous Locking Function in the DLM that can be used in the kernel.
> Thanks in advance,

No there isn't at the moment. It would be pretty easy to add but this is the
first time anyone has shown any interest. Most kernel activity is asynchronous
anyway.

-- 

patrick



From pcaulfie at redhat.com  Mon Feb 20 09:04:05 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 20 Feb 2006 09:04:05 +0000
Subject: [Linux-cluster] heartbeat network
In-Reply-To: <39fdf1c70602170857k167e5eefs@mail.gmail.com>
References: <39fdf1c70602170857k167e5eefs@mail.gmail.com>
Message-ID: <43F98605.4050405@redhat.com>

Claudio Tassini wrote:
> Hi all,
> I have a rhcs4 with gfs on a three nodes cluster. The primary network
> interfaces are configured for bonding, but if a node becomes
> unresponsive I would like to have at least another separated network
> that the cluster should use for checking other nodes' status. That is, I
> need that each node checks the status of other nodes on the primary
> network interface (the one associated with the hostname), AND on another
> network, private to the cluster, also configured with bonding.
> 
> Is it possible?
> 

no, sorry.


-- 

patrick



From erwan at seanodes.com  Mon Feb 20 10:06:03 2006
From: erwan at seanodes.com (Velu Erwan)
Date: Mon, 20 Feb 2006 11:06:03 +0100
Subject: [Linux-cluster] GFS Crash
In-Reply-To: <87bqx28n2n.fsf@paris.squiz.net>
References: <87bqx28n2n.fsf@paris.squiz.net>
Message-ID: <43F9948B.5000302@seanodes.com>

Chmouel Boudjnah a ?crit :

>Hi,
>  
>
Hey chmouel,
Nice to see you ;)

>Using GFS on RHEL4 (kernel 2.6.9-22.0.2.ELsmp) i got a crash with
>it. 
>
>Before in the dmesg, there is some OOM-Killer activity with httpd.
>It seems to get something to do with the HIGHMEM and the 4GB of RAM.
>  
>
I don't know if its related directly with your troubles but I saw this 
in the 2.6.11 changelog:

[PATCH] vmscan: count writeback pages in nr_scanned
	
	OOM kills have been observed with 70% of the pages in lowmem being in the
	writeback state.  If we count those pages in sc->nr_scanned, the VM should
	throttle and wait for IO completion, instead of OOM killing.
	
	(akpm: this is how the code was designed to work - we broke it six months
	ago).

<andrea at suse.de>
	[PATCH] mm: adjust dirty threshold for lowmem-only mappings
	
	With Rik van Riel <riel at redhat.com>
	
	Simply running "dd if=/dev/zero of=/dev/hd<one you can miss>" can cause
	excessive amounts of dirty lowmem pagecache, due to the blockdev's
	non-highmem page allocation strategy.
	
	This patch effectively lowers the dirty limit for mappings which cannot be
	cached in highmem, counting the dirty limit as a percentage of lowmem
	instead.  This should prevent heavy block device writers from pushing the
	VM over the edge and triggering OOM kills.
	
	Signed-off-by: Rik van Riel <riel at redhat.com>
	Acked-by: Andrea Arcangeli <andrea at suse.de>
	Signed-off-by: Andrew Morton <akpm at osdl.org>
	Signed-off-by: Linus Torvalds <torvalds at osdl.org>


Erwan,



From renapte at vsnl.net  Mon Feb 20 13:12:23 2006
From: renapte at vsnl.net (renapte at vsnl.net)
Date: Mon, 20 Feb 2006 18:12:23 +0500
Subject: [Linux-cluster] Synchronous Locking Function for Kernel
Message-ID: <77d24477ce55.77ce5577d244@vsnl.net>

Hi,
I'm trying to use the DLM independently to acquire locks in the ext2 code. I'm calling the functions that refresh the kernel data structures in the AST after taking the locks. I'd like to know how to find out when this AST has been called, ie when do I know the final result of the locking so as to proceed in the ext2 function. I can see a completion structure in the holder structure in the glock layer. Is that what performs this task in the gfs?
Thanks
Renuka.

----- Original Message -----
From: Patrick Caulfield <pcaulfie at redhat.com>
Date: Monday, February 20, 2006 2:33 pm
Subject: Re: [Linux-cluster] Synchronous Locking Function for Kernel

> renapte at vsnl.net wrote:
> > Hi,
> > I'd like to know if there is any Synchronous Locking Function in 
> the DLM that can be used in the kernel.
> > Thanks in advance,
> 
> No there isn't at the moment. It would be pretty easy to add but 
> this is the
> first time anyone has shown any interest. Most kernel activity is 
> asynchronousanyway.
> 
> -- 
> 
> patrick
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From pcaulfie at redhat.com  Mon Feb 20 13:27:01 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 20 Feb 2006 13:27:01 +0000
Subject: [Linux-cluster] Synchronous Locking Function for Kernel
In-Reply-To: <77d24477ce55.77ce5577d244@vsnl.net>
References: <77d24477ce55.77ce5577d244@vsnl.net>
Message-ID: <43F9C3A5.50102@redhat.com>

renapte at vsnl.net wrote:
> Hi,
> I'm trying to use the DLM independently to acquire locks in the ext2 code. I'm calling the functions that refresh the kernel data structures in the AST after taking the locks. I'd like to know how to find out when this AST has been called, ie when do I know the final result of the locking so as to proceed in the ext2 function. I can see a completion structure in the holder structure in the glock layer. Is that what performs this task in the gfs?

Personally I'd use a waitqueue.

Allocate a structure that contains the LKSB and a waitqueue and pass that as
the ast parameter. Then you can wake up the mainline code in the AST routine.

-- 

patrick



From brilong at cisco.com  Mon Feb 20 19:27:51 2006
From: brilong at cisco.com (Brian Long)
Date: Mon, 20 Feb 2006 14:27:51 -0500
Subject: [Linux-cluster] 9i RAC GFS Certification?
Message-ID: <1140463671.4899.43.camel@brilong-lnx>

Hello,

I'm fairly new to this list.  Could someone point me to the document
which states Oracle and Red Hat have certified 9i RAC on GFS?  I'm
looking for a certification matrix; for example, I've heard only GLM,
not DLM, is certified as the locking manager.  Is this correct?

I've read the GFS 6.1 docs and I'm starting the RHCS guide.  Any other
documents would be greatly appreciated.  I've already seen the thread
about 2-node to 3+-node migration hell which requires a cluster down
time (yuck!).

Thanks for any tips as I try to certify 9i RAC on GFS for use in our IT
dept.

/Brian/
-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From brilong at cisco.com  Mon Feb 20 21:05:14 2006
From: brilong at cisco.com (Brian Long)
Date: Mon, 20 Feb 2006 16:05:14 -0500
Subject: [Linux-cluster] 9i RAC GFS Certification?
In-Reply-To: <1140463671.4899.43.camel@brilong-lnx>
References: <1140463671.4899.43.camel@brilong-lnx>
Message-ID: <1140469514.4899.47.camel@brilong-lnx>

On Mon, 2006-02-20 at 14:27 -0500, Brian Long wrote:
> Hello,
> 
> I'm fairly new to this list.  Could someone point me to the document
> which states Oracle and Red Hat have certified 9i RAC on GFS?  I'm
> looking for a certification matrix; for example, I've heard only GLM,
> not DLM, is certified as the locking manager.  Is this correct?

After reading a few Metalink articles, it appears the only certified
solution for 9i RAC is GFS 6.0 on RHEL 3 32-bit only.  This severely
limits things since I was hoping to deploy 9i RAC on RHEL 4 with GFS
6.1  :(

/Brian/
-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From lhh at redhat.com  Mon Feb 20 22:18:32 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 20 Feb 2006 17:18:32 -0500
Subject: [Linux-cluster] Service try to start before disk mount and failed
In-Reply-To: <2bc.5415bc0.3128c8dc@aol.com>
References: <2bc.5415bc0.3128c8dc@aol.com>
Message-ID: <1140473912.3048.6.camel@ayanami.boston.redhat.com>

On Sat, 2006-02-18 at 14:00 -0500, DYM3FYEHE at aol.com wrote:
> I am using RHCS V4 for Oracle database and another application running
> on the two node failover cluster system. The application has different
> services and installed in the shared storage. The cluster manager
> starts the services before the shared drives mounted to the second
> node. I use script to start and stop the service with time delay but
> still the service start before the shared storage moved to the second
> node and failed. Is any body knows how I can control the storage
> resource and the service? I want to make the service to wait until the
> shared storage available in the second node.

Hi,

I'm not exactly sure what you're trying to do.  Shared storage is
expected to be concurrently accessible by all nodes at the same time.
Are you trying to use one service as shared storage for another service?

Having a delay can be racy...

I'm sure there's an answer to your question. :) 

-- Lon




From cboudjnah at squiz.net  Mon Feb 20 23:13:46 2006
From: cboudjnah at squiz.net (Chmouel Boudjnah)
Date: Tue, 21 Feb 2006 10:13:46 +1100
Subject: [Linux-cluster] GFS Crash
In-Reply-To: <43F9948B.5000302@seanodes.com> (Velu Erwan's message of "Mon,
	20 Feb 2006 11:06:03 +0100")
References: <87bqx28n2n.fsf@paris.squiz.net> <43F9948B.5000302@seanodes.com>
Message-ID: <87accl7hed.fsf@paris.squiz.net>

Velu Erwan <erwan at seanodes.com> writes:

> Hey chmouel,
> Nice to see you ;)

Hey Erwan, yep nice to see you too ;)

>>Using GFS on RHEL4 (kernel 2.6.9-22.0.2.ELsmp) i got a crash with
>> it. 
>>
>>Before in the dmesg, there is some OOM-Killer activity with httpd.
>>It seems to get something to do with the HIGHMEM and the 4GB of RAM.
>>  
>>
> I don't know if its related directly with your troubles but I saw this
> in the 2.6.11 changelog:

This sounds exactly the problem, thanks for the pointers Erwan.

-- 
Chmouel Boudjnah - Squiz.net - http://www.squiz.net/



From Birger.Wathne at ift.uib.no  Tue Feb 21 08:42:36 2006
From: Birger.Wathne at ift.uib.no (Birger Wathne)
Date: Tue, 21 Feb 2006 09:42:36 +0100
Subject: [Linux-cluster] FC4 repository and kernel modules
Message-ID: <43FAD27C.60600@ift.uib.no>

I tried installing a new FC4 cluster node yesterday.

install of the kernel modules (cman, dlm and GFS) fail because they 
depend on a kernel that doesn't seem to be there anymore.

Shouldn't either someone make sure the kernel modules get recompiled for 
the latest released kernel, or make sure the kernel these modules depend 
on doesn't get removed?

I solved the problem by fetching the kernel I needed from the yum cache 
on another cluster node.

-- 
birger



From Alain.Moulle at bull.net  Tue Feb 21 09:15:46 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 21 Feb 2006 10:15:46 +0100
Subject: [Linux-cluster] Any idea on a stop problem with CS4 ?
Message-ID: <43FADA42.5030701@bull.net>

Hi

We use a 2 nodes cluster to manage failover services via dedicated scripts.
Using clusvcadm -r <service_name> to migrate a service from one node
to the other, it happens from time to time that the CS4 is stuck with
"service_name stopping" diagnostic.
The stop target of the script associated with the service is not called. Subsequent
clusvcadm -d <service_name> calls return a success diagnostic but do
effectively strictly nothing : the service script is not called.

The  only way we found to solve the problem is to restart the cluster suite
daemons (ccsd, cman, fenced & rgmanager) on both nodes.

Often, this does not work too and it is necessary to reboot both nodes.
If we reboot only the apparent failing node, the cluster fails immediatly again
in the dead-lock situation as if information was taken from some memory of the
paired node.

Any idea on this problem ?
Thanks
Alain Moull?




From Birger.Wathne at ift.uib.no  Tue Feb 21 12:51:36 2006
From: Birger.Wathne at ift.uib.no (Birger Wathne)
Date: Tue, 21 Feb 2006 13:51:36 +0100
Subject: [Linux-cluster] Re: generic fencing, stonith,	etc [was: gfs and
	fencing]
In-Reply-To: <1132167863.31123.24.camel@ayanami.boston.redhat.com>
References: <B6A0A04D59978745A68272143BE55BD42CA596@laxmsex01.corp.jettis.com>	<1132091070.16174.54.camel@ayanami.boston.redhat.com>	<437AF879.2070205@veles.ru>
	<1132167863.31123.24.camel@ayanami.boston.redhat.com>
Message-ID: <43FB0CD8.4080706@ift.uib.no>

Lon Hohberger wrote:

>Certainly, the 'ssh' agent could automatically recover in a few more
>cases than the 'manual' agent ('meatware' in Linux-HA lingo).
>  
>
And with multi-level fencing, ssh could be tried first, then the 
meatware could be stirred...

>
>Someone mentioned recently (elsewhere) that it would be great if we had
>a fencing agent which just called some user-specified command to do
>things (and could substitute variables if necessary, like %p ->
>password, %l -> login, etc.):
>  
>
I agree that this would be very nice :-)

>(a) I do not think this is not a supportable solution, due to the
>limitless array of possible configurations for hardware we have never
>heard of.  All support for this agent would likely be limited to this
>mailing list and bugs in the actual agent itself.
>  
>
I don't think that would be a problem. Most (maybe not all) admins 
should be smart enough to understand that you cannot support the command 
used within this agent even if the agent is supported.

>(b) Rather than developing a proper fence agent for their particular
>hardware, people will use this as a means to an end, which will not
>improve the linux-cluster project as a whole.  That sucks :(
>  
>
Is there any good documentation on how to make a fencing agent? What 
kind of info the agent can get from the cluster service, etc... Would be 
great! I would like to automatically find a list of scsi-devices shared 
between 2 systems in a 2-node cluster so I could hack up a fencing 
device that used scsi reservation. Combine this with the stonith option 
in the scsi driver and I would have a nice fence for cheap clusters. If 
I have to manually configure the list of devices it would be easier to 
use the proposed fence agent to run some existing command-line tool.

>So, given that it is not up to me, who else wants this (I'm talking to
>you lurkers out there).  I am waiting for the flaming mantis to burn me
>for even suggesting such a thing...
>
I certainly want it. It would fill an immediate need to get up and 
running with better fencing than the manual one. Something as simple as 
ssh would have a better chance at getting the system quickly back to 
service than waiting for the meat...

-- 
birger



From suvankar_moitra at yahoo.com  Tue Feb 21 12:57:11 2006
From: suvankar_moitra at yahoo.com (SUVANKAR MOITRA)
Date: Tue, 21 Feb 2006 04:57:11 -0800 (PST)
Subject: [Linux-cluster] oracle 10g installation on RHCS-4
Message-ID: <20060221125711.86256.qmail@web52302.mail.yahoo.com>

dear sir,

I want to install oracle 10g on Rhcs-4 cluster
active/passive.Pl send me a document for the above
installation .

Pl help me its very very urgent.

regards

Suvankar MOitra
+91-033-9830152623

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From lhh at redhat.com  Tue Feb 21 14:40:02 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 21 Feb 2006 09:40:02 -0500
Subject: [Linux-cluster] Any idea on a stop problem with CS4 ?
In-Reply-To: <43FADA42.5030701@bull.net>
References: <43FADA42.5030701@bull.net>
Message-ID: <1140532802.19886.8.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-21 at 10:15 +0100, Alain Moulle wrote:
> Hi
> 
> We use a 2 nodes cluster to manage failover services via dedicated scripts.
> Using clusvcadm -r <service_name> to migrate a service from one node
> to the other, it happens from time to time that the CS4 is stuck with
> "service_name stopping" diagnostic.

Could you let us know:

- architecture
- dlm-kernel package version
- rgmanager version
- service XML structure
- if possible, the service script itself (though this is the least
likely problem)

If you can, install the corresponding -debuginfo packages so we can get
a backtrace of the rgmanager daemon.


> The stop target of the script associated with the service is not called. Subsequent
> clusvcadm -d <service_name> calls return a success diagnostic but do
> effectively strictly nothing : the service script is not called.

There's a segfault (which is fixed in RHCS4U3 beta and CVS) which might
explain the behavior.

-- Lon




From ben.yarwood at juno.co.uk  Tue Feb 21 15:56:01 2006
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Tue, 21 Feb 2006 15:56:01 -0000
Subject: [Linux-cluster] Fence agents not working as expected
In-Reply-To: <1140103401.19355.100.camel@ayanami.boston.redhat.com>
Message-ID: <004301c636ff$510bb540$3964a8c0@WS076>

Thank you for the info and everything now works fine.




Ben Yarwood
Technical Director
Juno Records
t - 020 7424 2804
m - 07930 922 333
e - ben.yarwood at juno.co.uk 


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: 16 February 2006 15:23
To: linux clustering
Subject: RE: [Linux-cluster] Fence agents not working as expected

On Thu, 2006-02-16 at 14:49 +0000, Ben Yarwood wrote:
> Ok progress, but still not working
> 
> Feb 16 10:34:33 jrmedia-b kernel: CMAN: node jrmedia-a has been 
> removed from the cluster : Missed too many heartbeats Feb 16 10:34:34 
> jrmedia-b fenced[2055]: jrmedia-a not a cluster member after 0 sec 
> post_fail_delay Feb 16 10:34:34 jrmedia-b fenced[2055]: fencing node 
> "jrmedia-a"
> Feb 16 10:34:37 jrmedia-b fenced[2055]: agent "fence_wti" reports: 
> parse
> error: unknown option "ipaddress=10.0.1.40" 
> 
> I changed ipaddress= to ipaddr= and now I get the following
> 
> Feb 16 14:44:20 jrmedia-a fenced[2110]: agent "fence_wti" reports: failed:
> no operation specified

This was fixed awhile ago.  The default was supposed to be "reboot"; and is
fixed in current CVS and ought to have been fixed in RHCS4U2.

Add option="off" for now.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162805
http://rhn.redhat.com/errata/RHBA-2005-737.html

-- Lon

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster






From depeecmr at yahoo.com  Tue Feb 21 18:36:57 2006
From: depeecmr at yahoo.com (Daniel EPEE LEA)
Date: Tue, 21 Feb 2006 10:36:57 -0800 (PST)
Subject: [Linux-cluster] Cluster for  not cluster aware application
In-Reply-To: <004301c636ff$510bb540$3964a8c0@WS076>
Message-ID: <20060221183657.63445.qmail@web30204.mail.mud.yahoo.com>

Hello,

I have a SAN + 2 nodes GFS cluster. 

I was wondering how to install an anplication that is
not  cluster aware. Anybody done that ? Please forward
me to any usefull doc.

Thanks

Much Regards,

Daniel

-----------------------------------------------------------------------------
T O    G O D       B E        T H E         G L O R Y     :)
------------------------------------------------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From cfeist at redhat.com  Tue Feb 21 19:54:46 2006
From: cfeist at redhat.com (Chris Feist)
Date: Tue, 21 Feb 2006 13:54:46 -0600
Subject: [Linux-cluster] FC4 repository and kernel modules
In-Reply-To: <43FAD27C.60600@ift.uib.no>
References: <43FAD27C.60600@ift.uib.no>
Message-ID: <43FB7006.6020704@redhat.com>

We do our best to keep our FC4 kernel packages in sync with the kernel 
packages, but it isn't always possible.  The current *-kernel packages are in 
fedora updates-testing and should be moved to fedora-updates by tomorrow.

Thanks,
Chris

Birger Wathne wrote:
> I tried installing a new FC4 cluster node yesterday.
> 
> install of the kernel modules (cman, dlm and GFS) fail because they 
> depend on a kernel that doesn't seem to be there anymore.
> 
> Shouldn't either someone make sure the kernel modules get recompiled for 
> the latest released kernel, or make sure the kernel these modules depend 
> on doesn't get removed?
> 
> I solved the problem by fetching the kernel I needed from the yum cache 
> on another cluster node.
> 



From javiermarasco at yahoo.com.ar  Tue Feb 21 18:18:47 2006
From: javiermarasco at yahoo.com.ar (javier marasco)
Date: Tue, 21 Feb 2006 15:18:47 -0300
Subject: [Linux-cluster] About writing on the FS
Message-ID: <200602220134.k1M1YY4n011811@mx1.redhat.com>

Hi , im searching information on how to the linux kernel writes into de FS
(local HD) , i mean , if i run an application , and this application writes
somthing on the HD , how it happend? , the steps , the syscalls imlpicated
on the process.
Please , if anyone have the procces by the kernel writes on HD (locally or
on a SAN) , please explain to me or givme the link and i read.
 
Thanks a lot , and excuse me if this is not the correct place to request
this.
 

Antonio Javier Marasco
Service IT Solutions
www.serviceit.com.ar <http://www.serviceit.com.ar/> 
Tel.: 5277-3555 Fax: 4322-3555
Esmeralda 719 Piso 6
CP: C1007ABG
Capital Federal - Argentina

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060221/9026a2b4/attachment.htm>

From yeheyis at gmail.com  Wed Feb 22 03:39:29 2006
From: yeheyis at gmail.com (Yeheyis Workeneh)
Date: Tue, 21 Feb 2006 19:39:29 -0800
Subject: [Linux-cluster] Service try to start before disk mount and failed
In-Reply-To: <1140473912.3048.6.camel@ayanami.boston.redhat.com>
References: <2bc.5415bc0.3128c8dc@aol.com>
	<1140473912.3048.6.camel@ayanami.boston.redhat.com>
Message-ID: <3c380f2b0602211939we8e3762jcc4d5d8bfbdf9a40@mail.gmail.com>

Hi,


I am using fail over cluster with share storage for Oracle Db and another
application. Actually by putting reasonable time delay on the script, I am
able to delay the service to start after the mount point completed.



The service script are in the /etc/init.d in each system but the database
and another application are running on the share storage.



On 2/20/06, Lon Hohberger <lhh at redhat.com> wrote:
>
> On Sat, 2006-02-18 at 14:00 -0500, DYM3FYEHE at aol.com wrote:
> > I am using RHCS V4 for Oracle database and another application running
> > on the two node failover cluster system. The application has different
> > services and installed in the shared storage. The cluster manager
> > starts the services before the shared drives mounted to the second
> > node. I use script to start and stop the service with time delay but
> > still the service start before the shared storage moved to the second
> > node and failed. Is any body knows how I can control the storage
> > resource and the service? I want to make the service to wait until the
> > shared storage available in the second node.
>
> Hi,
>
> I'm not exactly sure what you're trying to do.  Shared storage is
> expected to be concurrently accessible by all nodes at the same time.
> Are you trying to use one service as shared storage for another service?
>
> Having a delay can be racy...
>
> I'm sure there's an answer to your question. :)
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060221/54fbc9cb/attachment.htm>

From erwan at seanodes.com  Wed Feb 22 08:57:58 2006
From: erwan at seanodes.com (Velu Erwan)
Date: Wed, 22 Feb 2006 09:57:58 +0100
Subject: [Linux-cluster] GFS Crash
In-Reply-To: <87accl7hed.fsf@paris.squiz.net>
References: <87bqx28n2n.fsf@paris.squiz.net> <43F9948B.5000302@seanodes.com>
	<87accl7hed.fsf@paris.squiz.net>
Message-ID: <43FC2796.30509@seanodes.com>

Chmouel Boudjnah a ?crit :

>This sounds exactly the problem, thanks for the pointers Erwan.
>
>  
>
You're welcome ;)
If you try a patch derivated from the 2.6.11 or even the 2.6.11 or 
later, please let us know if it had solved your troubles.

See you,
Erwan



From cboudjnah at squiz.net  Wed Feb 22 10:04:37 2006
From: cboudjnah at squiz.net (Chmouel Boudjnah)
Date: Wed, 22 Feb 2006 21:04:37 +1100
Subject: [Linux-cluster] GFS Crash
In-Reply-To: <43FC2796.30509@seanodes.com> (Velu Erwan's message of "Wed,
	22 Feb 2006 09:57:58 +0100")
References: <87bqx28n2n.fsf@paris.squiz.net> <43F9948B.5000302@seanodes.com>
	<87accl7hed.fsf@paris.squiz.net> <43FC2796.30509@seanodes.com>
Message-ID: <87hd6rzp3e.fsf@paris.squiz.net>

Velu Erwan <erwan at seanodes.com> writes:

> You're welcome ;)
> If you try a patch derivated from the 2.6.11 or even the 2.6.11 or
> later, please let us know if it had solved your troubles.

the first one was already applied on a std redhat kernel. I got the
second one thought, it seems that we have a directory with a lot of
files in it and seems take over for httpd when it looksup inside the
files.

With the patch applied i don't seems to have the problems anymore but
i made a lot of tweaking on it so i am not sure if is a cause and
effect situation.

We are simply planning to workaround the creations of theses files in
our app. 

Is there people using GFS with directories that contains a lot of
files on it ?

-- 
Chmouel Boudjnah - Squiz.net - http://www.squiz.net/



From yfttyfs at gmail.com  Wed Feb 22 12:04:07 2006
From: yfttyfs at gmail.com (y f)
Date: Wed, 22 Feb 2006 20:04:07 +0800
Subject: [Linux-cluster] About writing on the FS
In-Reply-To: <200602220134.k1M1YY4n011811@mx1.redhat.com>
References: <200602220134.k1M1YY4n011811@mx1.redhat.com>
Message-ID: <78fcc84a0602220404y7dc7ebd4j7acc4a1ef569e31a@mail.gmail.com>

for linux-2.6.13-1.1526_FC4

1. syscall to buffer_header:

syscall_table.S :: sys_call_table ->; read_write.c :: sys_write( ) ->;
read_write.c :: vfs_write( ) ->; ext3/file.c :: file->;f_op->;write( ) ==
read_write.c :: do_sync_write( ) ->; ext3/file.c :: filp->;f_op->;aio_write(
) == ext3_file_write -->; filemap.c :: generic_file_aio_write( ) ->;
filemapc :: __generic_file_aio_write_noblock( ) ->; filemap.c ::
generic_file_buffered_write( ) ->; ext3/inode.c :: a_ops->;commit_write( )
== ext3/inode.c :: ext3_writeback_commit_write( ) ->; buffer.c
:: generic_commit_write( ) ->; buffer.c :: __block_commit_write( ) ->;
buffer.c :: mark_buffer_dirty( ) ->; page-writeback.c ::
__set_page_dirty_nobuffers( ) ->; fs-writeback.c :: __mark_inode_dirty( )

2. buffer_header to block layer

ext3/fsync.c :: ext3_sync_file( ) ->; fs-writeback.c :: sync_inode( ) ->;
fs-writeback.c :: __writeback_single_inode( ) ->; fs-writeback.c ::
__sync_single_inode(
) ->; page-writeback.c :: do_writepages( ) ->; mpage.h ::
generic_writepages( ) ->; mpage.c :: mpage_writepages( ) ->;
a_ops->;writepage( ) == ext3_ordered_writepage( ) ->; buffer.c ::
block_write_full_page( ) ->; buffer.c :: __block_write_full_page( ) ->;
buffer.c :: submit_bh( ) ->; ll_rw_blk.c :: submit_bio( ) ->; ll_rw_blk.c ::
generic_make_request( )

The success/failure status of the request, along with notification of
completion, is delivered asynchronously through the bio->;bi_end_io
function, which is bio->;bi_end_io = end_bio_bh_io_sync( ) of buffer.c

3. block layer to IDE layer
Todo

Is above help ?

/yf


On 2/22/06, javier marasco <javiermarasco at yahoo.com.ar> wrote:
>
> Hi , im searching information on how to the linux kernel writes into de FS
> (local HD) , i mean , if i run an application , and this application writes
> somthing on the HD , how it happend? , the steps , the syscalls imlpicated
> on the process.
> Please , if anyone have the procces by the kernel writes on HD (locally or
> on a SAN) , please explain to me or givme the link and i read.
>
> Thanks a lot , and excuse me if this is not the correct place to request
> this.
>
>
> *Antonio Javier Marasco
> *Service IT Solutions
> *www.serviceit.com.ar
> *Tel.: 5277-3555 Fax: 4322-3555
> Esmeralda 719 Piso 6
> CP: C1007ABG
> Capital Federal - Argentina
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060222/3ef4ae25/attachment.htm>

From Birger.Wathne at ift.uib.no  Wed Feb 22 13:22:32 2006
From: Birger.Wathne at ift.uib.no (Birger Wathne)
Date: Wed, 22 Feb 2006 14:22:32 +0100
Subject: [Linux-cluster] FC4 repository and kernel modules
In-Reply-To: <43FB7006.6020704@redhat.com>
References: <43FAD27C.60600@ift.uib.no> <43FB7006.6020704@redhat.com>
Message-ID: <43FC6598.7030906@ift.uib.no>

Chris Feist wrote:

> We do our best to keep our FC4 kernel packages in sync with the kernel 
> packages, but it isn't always possible.  The current *-kernel packages 
> are in fedora updates-testing and should be moved to fedora-updates by 
> tomorrow.

Great :-)
Nice to know this is being actively followed up. It would be nice if old 
kernels didn't disappear until all kernel modules had been updated.

I'll upgrade my kernel modules tomorrow, then.

-- 
birger



From caro_aviv at diligent.com  Wed Feb 22 18:19:11 2006
From: caro_aviv at diligent.com (Caro, Aviv)
Date: Wed, 22 Feb 2006 20:19:11 +0200
Subject: [Linux-cluster] GFS cache granularity
Message-ID: <453D02254A9EBC45866DBF28FECEA46F0388F4@ILEX01.corp.diligent.com>

What is the cache granularity of the GFS? If a file is modified by one
of the nodes in the cluster, does it mean that other node's cache
(related to this file), is invalidated?    

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060222/6e0b8547/attachment.htm>

From baostr at gmail.com  Wed Feb 22 22:10:38 2006
From: baostr at gmail.com (Boris Ostrovsky)
Date: Wed, 22 Feb 2006 17:10:38 -0500
Subject: [Linux-cluster] snapshots under CLVM
Message-ID: <fd8dc0d20602221410p29c677d1j26ae7a906969e84@mail.gmail.com>

Hello,


I poked around the archives and it seems that clustered snapshots
are not yet supported. Are there any plans to do this?

Thanks
-boris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060222/d2c96f2e/attachment.htm>

From dhe at redhat.com  Thu Feb 23 00:17:53 2006
From: dhe at redhat.com (Dawson J He)
Date: Thu, 23 Feb 2006 10:17:53 +1000
Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 22, Issue 23
In-Reply-To: <20060222170007.1D73973601@hormel.redhat.com>
References: <20060222170007.1D73973601@hormel.redhat.com>
Message-ID: <43FCFF31.1050001@redhat.com>

linux-cluster-request at redhat.com wrote:

>Send Linux-cluster mailing list submissions to
>	linux-cluster at redhat.com
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	https://www.redhat.com/mailman/listinfo/linux-cluster
>or, via email, send a message with subject or body 'help' to
>	linux-cluster-request at redhat.com
>
>You can reach the person managing the list at
>	linux-cluster-owner at redhat.com
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Linux-cluster digest..."
>
>
>Today's Topics:
>
>   1. Cluster for  not cluster aware application (Daniel EPEE LEA)
>   2. Re: FC4 repository and kernel modules (Chris Feist)
>   3. About writing on the FS (javier marasco)
>   4. Re: Service try to start before disk mount and failed
>      (Yeheyis Workeneh)
>   5. Re: GFS Crash (Velu Erwan)
>   6. Re: GFS Crash (Chmouel Boudjnah)
>   7. Re: About writing on the FS (y f)
>   8. Re: FC4 repository and kernel modules (Birger Wathne)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Tue, 21 Feb 2006 10:36:57 -0800 (PST)
>From: Daniel EPEE LEA <depeecmr at yahoo.com>
>Subject: [Linux-cluster] Cluster for  not cluster aware application
>To: linux clustering <linux-cluster at redhat.com>
>Message-ID: <20060221183657.63445.qmail at web30204.mail.mud.yahoo.com>
>Content-Type: text/plain; charset=iso-8859-1
>
>Hello,
>
>I have a SAN + 2 nodes GFS cluster. 
>
>I was wondering how to install an anplication that is
>not  cluster aware. Anybody done that ? Please forward
>me to any usefull doc.
>
>Thanks
>
>Much Regards,
>
>Daniel
>
>-----------------------------------------------------------------------------
>T O    G O D       B E        T H E         G L O R Y     :)
>------------------------------------------------------------------------------
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>
>
>
>------------------------------
>
>Message: 2
>Date: Tue, 21 Feb 2006 13:54:46 -0600
>From: Chris Feist <cfeist at redhat.com>
>Subject: Re: [Linux-cluster] FC4 repository and kernel modules
>To: linux clustering <linux-cluster at redhat.com>,
>	Birger.Wathne at ift.uib.no
>Message-ID: <43FB7006.6020704 at redhat.com>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>We do our best to keep our FC4 kernel packages in sync with the kernel 
>packages, but it isn't always possible.  The current *-kernel packages are in 
>fedora updates-testing and should be moved to fedora-updates by tomorrow.
>
>Thanks,
>Chris
>
>Birger Wathne wrote:
>  
>
>>I tried installing a new FC4 cluster node yesterday.
>>
>>install of the kernel modules (cman, dlm and GFS) fail because they 
>>depend on a kernel that doesn't seem to be there anymore.
>>
>>Shouldn't either someone make sure the kernel modules get recompiled for 
>>the latest released kernel, or make sure the kernel these modules depend 
>>on doesn't get removed?
>>
>>I solved the problem by fetching the kernel I needed from the yum cache 
>>on another cluster node.
>>
>>    
>>
>
>
>
>------------------------------
>
>Message: 3
>Date: Tue, 21 Feb 2006 15:18:47 -0300
>From: "javier marasco" <javiermarasco at yahoo.com.ar>
>Subject: [Linux-cluster] About writing on the FS
>To: "'linux clustering'" <linux-cluster at redhat.com>
>Message-ID: <200602220134.k1M1YY4n011811 at mx1.redhat.com>
>Content-Type: text/plain; charset="us-ascii"
>
>Hi , im searching information on how to the linux kernel writes into de FS
>(local HD) , i mean , if i run an application , and this application writes
>somthing on the HD , how it happend? , the steps , the syscalls imlpicated
>on the process.
>Please , if anyone have the procces by the kernel writes on HD (locally or
>on a SAN) , please explain to me or givme the link and i read.
> 
>Thanks a lot , and excuse me if this is not the correct place to request
>this.
> 
>
>Antonio Javier Marasco
>Service IT Solutions
>www.serviceit.com.ar <http://www.serviceit.com.ar/> 
>Tel.: 5277-3555 Fax: 4322-3555
>Esmeralda 719 Piso 6
>CP: C1007ABG
>Capital Federal - Argentina
>
> 
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: https://www.redhat.com/archives/linux-cluster/attachments/20060221/9026a2b4/attachment.html
>
>------------------------------
>
>Message: 4
>Date: Tue, 21 Feb 2006 19:39:29 -0800
>From: "Yeheyis Workeneh" <yeheyis at gmail.com>
>Subject: Re: [Linux-cluster] Service try to start before disk mount
>	and failed
>To: "linux clustering" <linux-cluster at redhat.com>
>Message-ID:
>	<3c380f2b0602211939we8e3762jcc4d5d8bfbdf9a40 at mail.gmail.com>
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hi,
>
>
>I am using fail over cluster with share storage for Oracle Db and another
>application. Actually by putting reasonable time delay on the script, I am
>able to delay the service to start after the mount point completed.
>
>
>
>The service script are in the /etc/init.d in each system but the database
>and another application are running on the share storage.
>
>
>
>On 2/20/06, Lon Hohberger <lhh at redhat.com> wrote:
>  
>
>>On Sat, 2006-02-18 at 14:00 -0500, DYM3FYEHE at aol.com wrote:
>>    
>>
>>>I am using RHCS V4 for Oracle database and another application running
>>>on the two node failover cluster system. The application has different
>>>services and installed in the shared storage. The cluster manager
>>>starts the services before the shared drives mounted to the second
>>>node. I use script to start and stop the service with time delay but
>>>still the service start before the shared storage moved to the second
>>>node and failed. Is any body knows how I can control the storage
>>>resource and the service? I want to make the service to wait until the
>>>shared storage available in the second node.
>>>      
>>>
>>Hi,
>>
>>I'm not exactly sure what you're trying to do.  Shared storage is
>>expected to be concurrently accessible by all nodes at the same time.
>>Are you trying to use one service as shared storage for another service?
>>
>>Having a delay can be racy...
>>
>>I'm sure there's an answer to your question. :)
>>
>>-- Lon
>>
>>
>>--
>>Linux-cluster mailing list
>>Linux-cluster at redhat.com
>>https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>    
>>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: https://www.redhat.com/archives/linux-cluster/attachments/20060221/54fbc9cb/attachment.html
>
>------------------------------
>
>Message: 5
>Date: Wed, 22 Feb 2006 09:57:58 +0100
>From: Velu Erwan <erwan at seanodes.com>
>Subject: Re: [Linux-cluster] GFS Crash
>To: linux clustering <linux-cluster at redhat.com>
>Cc: Marcus Nyeholt <mnyeholt at squiz.net>, Moe.Khan at austereo.com.au
>Message-ID: <43FC2796.30509 at seanodes.com>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Chmouel Boudjnah a ?crit :
>
>  
>
>>This sounds exactly the problem, thanks for the pointers Erwan.
>>
>> 
>>
>>    
>>
>You're welcome ;)
>If you try a patch derivated from the 2.6.11 or even the 2.6.11 or 
>later, please let us know if it had solved your troubles.
>
>See you,
>Erwan
>
>
>
>------------------------------
>
>Message: 6
>Date: Wed, 22 Feb 2006 21:04:37 +1100
>From: Chmouel Boudjnah <cboudjnah at squiz.net>
>Subject: Re: [Linux-cluster] GFS Crash
>To: linux clustering <linux-cluster at redhat.com>
>Cc: Marcus Nyeholt <mnyeholt at squiz.net>, Moe.Khan at austereo.com.au
>Message-ID: <87hd6rzp3e.fsf at paris.squiz.net>
>Content-Type: text/plain; charset=us-ascii
>
>Velu Erwan <erwan at seanodes.com> writes:
>
>  
>
>>You're welcome ;)
>>If you try a patch derivated from the 2.6.11 or even the 2.6.11 or
>>later, please let us know if it had solved your troubles.
>>    
>>
>
>the first one was already applied on a std redhat kernel. I got the
>second one thought, it seems that we have a directory with a lot of
>files in it and seems take over for httpd when it looksup inside the
>files.
>
>With the patch applied i don't seems to have the problems anymore but
>i made a lot of tweaking on it so i am not sure if is a cause and
>effect situation.
>
>We are simply planning to workaround the creations of theses files in
>our app. 
>
>Is there people using GFS with directories that contains a lot of
>files on it ?
>
>  
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060223/aa5b5b7f/attachment.htm>

From tekion at gmail.com  Thu Feb 23 01:07:51 2006
From: tekion at gmail.com (Screaming Eagle)
Date: Wed, 22 Feb 2006 20:07:51 -0500
Subject: [Linux-cluster] Does 2.6.15 kernel include...
Message-ID: <ee9c961f0602221707l5aaa8c3an7e802f8f72c160b4@mail.gmail.com>

Does any one know if 2.6.15 kernel includes GFS? If not, then were can I get
the patch for it? Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060222/bbea4ea2/attachment.htm>

From Alain.Moulle at bull.net  Thu Feb 23 07:44:03 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 23 Feb 2006 08:44:03 +0100
Subject: [Linux-cluster] CS4/ "Monitoring the health of remote power switch"
	?
Message-ID: <43FD67C3.9060606@bull.net>

Hi

In the CS4 Red Hat Documentation chapter
"Red Hat Cluster Manager Overview" , it
is written :
"To monitor the health of the other nodes,
each node monitors the health of remote
power switch if any ,etc."

Is it really possible ?
Where could we configure this ?
Or is it hidden in the CS4 ,I mean
is it automatically done, based on the
fence lines information in cluster.conf,
and if so what exactly is done ? (periodic
ping on the IP adr of remote power switch )

Thanks if you can clarify this point for me.

Alain



From epeelea at gmail.com  Thu Feb 23 08:29:14 2006
From: epeelea at gmail.com (Daniel EPEE LEA)
Date: Thu, 23 Feb 2006 09:29:14 +0100
Subject: [Linux-cluster] GFS crash
Message-ID: <df22854e0602230029x462d211eo3cecaa500c353828@mail.gmail.com>

Hi,
I have 2 node GFS cluster  + SAN

I'im installing cpanel WHM in a cluster, and after adding a sub
interface ( eth0:1 )to one node's interface (eth0) , GFS crashed.

ping node2.mycluster.cm was no more possible. and cman_tool nodes will
show 2 nodes.

Is it possible to sub interface to a GFS cluster node ?  And How do u do it ?

Waiting for answers

Much Regards,

--------------------------
Daniel Epee Lea



From pcaulfie at redhat.com  Thu Feb 23 08:48:37 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 23 Feb 2006 08:48:37 +0000
Subject: [Linux-cluster] GFS crash
In-Reply-To: <df22854e0602230029x462d211eo3cecaa500c353828@mail.gmail.com>
References: <df22854e0602230029x462d211eo3cecaa500c353828@mail.gmail.com>
Message-ID: <43FD76E5.9000506@redhat.com>

Daniel EPEE LEA wrote:
> Hi,
> I have 2 node GFS cluster  + SAN
> 
> I'im installing cpanel WHM in a cluster, and after adding a sub
> interface ( eth0:1 )to one node's interface (eth0) , GFS crashed.
> 
> ping node2.mycluster.cm was no more possible. and cman_tool nodes will
> show 2 nodes.
> 
> Is it possible to sub interface to a GFS cluster node ?  And How do u do it ?
> 
> Waiting for answers


We'll need considerably more information than that. Just adding another
interface to eth0 shouldn't crash the system (and doesn't on any of my machines).

What exactly was the crash? Which lock manager are you using? How many
filesystems were mounted? what were they doing ?

patrick



From epeelea at gmail.com  Thu Feb 23 10:02:40 2006
From: epeelea at gmail.com (Daniel EPEE LEA)
Date: Thu, 23 Feb 2006 11:02:40 +0100
Subject: [Linux-cluster] GFS crash
In-Reply-To: <43FD76E5.9000506@redhat.com>
References: <df22854e0602230029x462d211eo3cecaa500c353828@mail.gmail.com>
	<43FD76E5.9000506@redhat.com>
Message-ID: <df22854e0602230202w23daa047uc16f2abf554341da@mail.gmail.com>

Patrick,

I
> We'll need considerably more information than that. Just adding another
> interface to eth0 shouldn't crash the system (and doesn't on any of my machines).
>
> What exactly was the crash? Which lock manager are you using? How many
> filesystems were mounted? what were they doing ?
What happen is that, as I added a sub interface, the node lost too
many heartbeats and was disconnected from the cluster.

I use dlm lock,

Only one file system is mounted. I read a Syncing failed message.

I have removed the sub interface, and all is good now.

> patrick
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
--------------------------
Daniel Epee Lea



From thaidn at gmail.com  Thu Feb 23 10:04:55 2006
From: thaidn at gmail.com (Thai Duong)
Date: Thu, 23 Feb 2006 17:04:55 +0700
Subject: [Linux-cluster] 9i RAC GFS Certification?
In-Reply-To: <1140469514.4899.47.camel@brilong-lnx>
References: <1140463671.4899.43.camel@brilong-lnx>
	<1140469514.4899.47.camel@brilong-lnx>
Message-ID: <d4e2d9970602230204t15045e26n3540d9adf74e459a@mail.gmail.com>

You are rite. ATM, Oracle only certifies GFS 6.0 on x86 RHEL3 only. Anyway,
we have been running a 9.2.0.7 RAC with GFS 6.0 on ia64 RHEL 3 for months
without no major problem. We had some problems with < 9.2.0.7 patchset but
after upgrading to the current patchset, the system is quite stable.

We have been testing another 9.2.0.7 RAC with GFS 6.1 on ia64 RHEL 4 for two
weeks. So far so good. If things go smoothly in the next two weeks, we will
upgrade our production system to RHEL4. Yeah we know that using something
not certified is dangerous but as far as I know Oracle has no plan to
certify GFS 6.1 on RHEL4. So we must certify it ourselves!

HTH,

Thai Duong.

PS: Brian, your lastname looks like a Vietnamese name. I'm a Vietnamese ;).
On 2/21/06, Brian Long <brilong at cisco.com> wrote:
>
> On Mon, 2006-02-20 at 14:27 -0500, Brian Long wrote:
> > Hello,
> >
> > I'm fairly new to this list.  Could someone point me to the document
> > which states Oracle and Red Hat have certified 9i RAC on GFS?  I'm
> > looking for a certification matrix; for example, I've heard only GLM,
> > not DLM, is certified as the locking manager.  Is this correct?
>
> After reading a few Metalink articles, it appears the only certified
> solution for 9i RAC is GFS 6.0 on RHEL 3 32-bit only.  This severely
> limits things since I was hoping to deploy 9i RAC on RHEL 4 with GFS
> 6.1  :(
>
> /Brian/
> --
>        Brian Long                      |         |           |
>        IT Data Center Systems          |       .|||.       .|||.
>        Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
>        Phone: (919) 392-7363           |   C i s c o   S y s t e m s
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060223/ad3dd17e/attachment.htm>

From Birger.Wathne at ift.uib.no  Thu Feb 23 10:18:23 2006
From: Birger.Wathne at ift.uib.no (Birger Wathne)
Date: Thu, 23 Feb 2006 11:18:23 +0100
Subject: [Linux-cluster] Ordering of NFS exports
Message-ID: <43FD8BEF.4030104@ift.uib.no>

I have a problem with my cluster that pops up now and then.

Sometimes the ordering of my NFS exports changes. This quickly becomes a 
problem, as some hosts need remote root access to file systems. When 
there is an earlier export matching the client that doesn't allow root 
access things start breaking here...

I would propose that the logic for exporting file systems should 
preserve ordering of exports for each file system. Perhaps by redoing 
all exports for a file system in correct order if any of the exports for 
that file system need to be renewed?

-- 
birger



From brilong at cisco.com  Thu Feb 23 12:33:21 2006
From: brilong at cisco.com (Brian Long)
Date: Thu, 23 Feb 2006 07:33:21 -0500
Subject: [Linux-cluster] 9i RAC GFS Certification?
In-Reply-To: <d4e2d9970602230204t15045e26n3540d9adf74e459a@mail.gmail.com>
References: <1140463671.4899.43.camel@brilong-lnx>
	<1140469514.4899.47.camel@brilong-lnx>
	<d4e2d9970602230204t15045e26n3540d9adf74e459a@mail.gmail.com>
Message-ID: <1140698001.4475.14.camel@brilong-lnx>

On Thu, 2006-02-23 at 17:04 +0700, Thai Duong wrote:
> You are rite. ATM, Oracle only certifies GFS 6.0 on x86 RHEL3 only.
> Anyway, we have been running a 9.2.0.7 RAC with GFS 6.0 on ia64 RHEL 3
> for months without no major problem. We had some problems with <
> 9.2.0.7 patchset but after upgrading to the current patchset, the
> system is quite stable. 
> 
> We have been testing another 9.2.0.7 RAC with GFS 6.1 on ia64 RHEL 4
> for two weeks. So far so good. If things go smoothly in the next two
> weeks, we will upgrade our production system to RHEL4. Yeah we know
> that using something not certified is dangerous but as far as I know
> Oracle has no plan to certify GFS 6.1 on RHEL4. So we must certify it
> ourselves!

Your last statement is incorrect.  Red Hat GFS folks are working with
Oracle to certify 10gR2 on GFS 6.1 on RHEL 4 in the near term; however,
GULM is the only supported lock manager for this certification.  They
require 3 external nodes to the RAC cluster (none of the RAC nodes can
be a standy lock node, for example).

Red Hat told me this was at the insistence of Oracle.  :(

/Brian/

-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From Alain.Moulle at bull.net  Thu Feb 23 14:44:26 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 23 Feb 2006 15:44:26 +0100
Subject: [Linux-cluster] Re: Re: Any idea on a stop problem with CS4 ?
Message-ID: <43FDCA4A.7090808@bull.net>


>>>>We use a 2 nodes cluster to manage failover services via dedicated scripts.
>>>>Using clusvcadm -r <service_name> to migrate a service from one node
>>>>to the other, it happens from time to time that the CS4 is stuck with
>>>>"service_name stopping" diagnostic.

>>Could you let us know:
>>
>>- architecture

Two nodes connected to the backbone through eth0 and by a direct connection
between them through eth1. Hostsname is set on eth0 which is also used as
fencing interface. Heart-beat is also configured on eth0.

>>- dlm-kernel package version : dlm-kernel.2.6.9-37.7.b.3

>>- rgmanager version : rgmanager.1.9.38-0.b.5

>>- service XML structure : what do you mean ? cluster.conf file ?
>>- if possible, the service script itself (though this is the least
>>likely problem)
>>If you can, install the corresponding -debuginfo packages so we can get
>>a backtrace of the rgmanager daemon.
>>
>>
Will do that. At present, the dead-lock does not occur systematically, however
it is frequent.
It can take a while for us to reproduce the problem with debug packages.


>>>>The stop target of the script associated with the service is not called.

>>Subsequent

>>>>clusvcadm -d <service_name> calls return a success diagnostic but do
>>>>effectively strictly nothing : the service script is not called.

>>There's a segfault (which is fixed in RHCS4U3 beta and CVS) which might
>>explain the behavior.

>>-- Lon

-- 



mailto:Alain.Moulle at bull.net
+------------------------------+--------------------------------+
|	Alain Moull?	       	| from France :	04 76 29 75 99  |
|                              	| FAX number  : 04 76 29 72 49  |
| Bull SA		       	|				|
| 1, Rue de Provence  		| Adr  : FREC B1-041            |
| B.P. 208			|				|
| 38432 Echirolles - CEDEX     	| Email: Alain.Moulle at bull.net  |
| France                       	| BCOM : 229 7599               |
+-------------------------------+-------------------------------+




From Daniel.Hennessey at aah.co.uk  Thu Feb 23 15:00:57 2006
From: Daniel.Hennessey at aah.co.uk (Hennessey Daniel)
Date: Thu, 23 Feb 2006 15:00:57 -0000
Subject: [Linux-cluster] 9i RAC GFS Certification?
Message-ID: <AC0FEFE02EB8624EA3CB32C9AA35EF3D0880EDC4@u607st67.aah.co.uk>

Any idea as to how long "near term" is?

Itanium support thwarts us at every turn (and we have 20 of them!), 10g with
GFS 6.1 on RHEL4-ia64 would answer all of my prayers.

cheers

Dan

-----Original Message-----
From: Brian Long [mailto:brilong at cisco.com] 
Sent: 23 February 2006 12:33
To: linux clustering
Subject: Re: [Linux-cluster] 9i RAC GFS Certification?

On Thu, 2006-02-23 at 17:04 +0700, Thai Duong wrote:
> You are rite. ATM, Oracle only certifies GFS 6.0 on x86 RHEL3 only.
> Anyway, we have been running a 9.2.0.7 RAC with GFS 6.0 on ia64 RHEL 3
> for months without no major problem. We had some problems with <
> 9.2.0.7 patchset but after upgrading to the current patchset, the
> system is quite stable. 
> 
> We have been testing another 9.2.0.7 RAC with GFS 6.1 on ia64 RHEL 4
> for two weeks. So far so good. If things go smoothly in the next two
> weeks, we will upgrade our production system to RHEL4. Yeah we know
> that using something not certified is dangerous but as far as I know
> Oracle has no plan to certify GFS 6.1 on RHEL4. So we must certify it
> ourselves!

Your last statement is incorrect.  Red Hat GFS folks are working with
Oracle to certify 10gR2 on GFS 6.1 on RHEL 4 in the near term; however,
GULM is the only supported lock manager for this certification.  They
require 3 external nodes to the RAC cluster (none of the RAC nodes can
be a standy lock node, for example).

Red Hat told me this was at the insistence of Oracle.  :(

/Brian/

-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

************************************************************************
DISCLAIMER
The information contained in this e-mail is confidential and is intended
for the recipient only.
If you have received it in error, please notify us immediately by reply 
e-mail and then delete it from your system. Please do not copy it or
use it for any other purposes, or disclose the content of the e-mail
to any other person or store or copy the information in any medium. 
The views contained in this e-mail are those of the author and not 
necessarily those of Admenta UK Group.
************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060223/fa24d724/attachment.htm>

From mbrookov at mines.edu  Thu Feb 23 15:00:48 2006
From: mbrookov at mines.edu (Matt Brookover)
Date: Thu, 23 Feb 2006 08:00:48 -0700
Subject: [Linux-cluster] 9i RAC GFS Certification?
In-Reply-To: <1140698001.4475.14.camel@brilong-lnx>
References: <1140463671.4899.43.camel@brilong-lnx>
	<1140469514.4899.47.camel@brilong-lnx>
	<d4e2d9970602230204t15045e26n3540d9adf74e459a@mail.gmail.com>
	<1140698001.4475.14.camel@brilong-lnx>
Message-ID: <1140706848.886.3.camel@merlin.Mines.EDU>

On Thu, 2006-02-23 at 05:33, Brian Long wrote:
> Your last statement is incorrect.  Red Hat GFS folks are working with
> Oracle to certify 10gR2 on GFS 6.1 on RHEL 4 in the near term; however,
> GULM is the only supported lock manager for this certification.  They
> require 3 external nodes to the RAC cluster (none of the RAC nodes can
> be a standy lock node, for example).
> 
> Red Hat told me this was at the insistence of Oracle.  :(
> 
> /Brian/

Does any body have any details on why Oracle would not want to support
DLM?

I have been under the impression that DLM was the way to go, especially
given the amount of effort that the Rehat team has put into DLM.

Matt




From jbrassow at redhat.com  Thu Feb 23 15:46:39 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Thu, 23 Feb 2006 09:46:39 -0600
Subject: [Linux-cluster] snapshots under CLVM
In-Reply-To: <fd8dc0d20602221410p29c677d1j26ae7a906969e84@mail.gmail.com>
References: <fd8dc0d20602221410p29c677d1j26ae7a906969e84@mail.gmail.com>
Message-ID: <cc56758bf4a669badfd4f5af91aacead@redhat.com>

there are plans, but they won't be implemented for some time.

  brassow

On Feb 22, 2006, at 4:10 PM, Boris Ostrovsky wrote:

> Hello,
>
>
>  I poked around the archives and it seems that clustered snapshots
>  are not yet supported. Are there any plans to do this?
>
>  Thanks
>  -boris
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From tekion at gmail.com  Thu Feb 23 20:21:52 2006
From: tekion at gmail.com (Screaming Eagle)
Date: Thu, 23 Feb 2006 15:21:52 -0500
Subject: [Linux-cluster] problem compiling GFS with FC3...
Message-ID: <ee9c961f0602231221w2f2d6d60t25e305d58e3cda7e@mail.gmail.com>

All,
I am getting errors on dlm-kernel when compiling cluster src on FC3. Here's
what I did before compiling it, downloaded 2.6.15 kernel to /usr/src/linux-
2.6.15. compiled it with O=/root/work/kernel.

download cluster via cvs command:  cvs -d
:pserver:cvs at sources.redhat.com:/cvs/cluster
checkout cluster

cd cluster/magma; ./configure --kernel_src=/root/work/kernel; make && make
install. This went find.
cd ..
./configure --kernel_src=/root/work/kernel;make.  This is where it pukes on
dlm_kernel.


Does any one have any idea what is wrong?  Shoud I be using
--kernel_src=/usr/src/linux-2.6.15 or --kernel_src=/root/work/kernel? In
either case I am having the same error with either vaules for kernel_src.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060223/3b3e68a2/attachment.htm>

From tekion at gmail.com  Thu Feb 23 21:17:31 2006
From: tekion at gmail.com (Screaming Eagle)
Date: Thu, 23 Feb 2006 16:17:31 -0500
Subject: [Linux-cluster] Re: problem compiling GFS with FC3...
In-Reply-To: <ee9c961f0602231221w2f2d6d60t25e305d58e3cda7e@mail.gmail.com>
References: <ee9c961f0602231221w2f2d6d60t25e305d58e3cda7e@mail.gmail.com>
Message-ID: <ee9c961f0602231317y2ee664bepcd9887a983b1575f@mail.gmail.com>

K, I got past this by giving the stable branch tag. But I am getting is
error:
/root/work/.build/cluster/gfs-kernel/src/gfs/ops_file.c: In function
`gfs_write':
/root/work/.build/cluster/gfs-kernel/src/gfs/ops_file.c:955: warning:
implicit declaration of function `mutex_lock'
/root/work/.build/cluster/gfs-kernel/src/gfs/ops_file.c:955: error:
structure has no member named `i_mutex'
/root/work/.build/cluster/gfs-kernel/src/gfs/ops_file.c:960: warning:
implicit declaration of function `mutex_unlock'
/root/work/.build/cluster/gfs-kernel/src/gfs/ops_file.c:960: error:
structure has no member named `i_mutex'
make[7]: *** [/root/work/.build/cluster/gfs-kernel/src/gfs/ops_file.o] Error
1
make[6]: *** [_module_/root/work/.build/cluster/gfs-kernel/src/gfs] Error 2
make[5]: *** [modules] Error 2
make[4]: *** [modules] Error 2
make[4]: Leaving directory `/root/work/kernel'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/root/work/.build/cluster/gfs-kernel/src/gfs'
make[2]: *** [install] Error 2
make[2]: Leaving directory `/root/work/.build/cluster/gfs-kernel/src'
make[1]: *** [install] Error 2
make[1]: Leaving directory `/root/work/.build/cluster/gfs-kernel'
make: *** [all] Error 2

Any idea?

On 2/23/06, Screaming Eagle <tekion at gmail.com> wrote:
>
> All,
> I am getting errors on dlm-kernel when compiling cluster src on FC3.
> Here's what I did before compiling it, downloaded 2.6.15 kernel to
> /usr/src/linux-2.6.15. compiled it with O=/root/work/kernel.
>
> download cluster via cvs command:  cvs -d :pserver:cvs at sources.redhat.com
> :/cvs/cluster checkout cluster
>
> cd cluster/magma; ./configure --kernel_src=/root/work/kernel; make && make
> install. This went find.
> cd ..
> ./configure --kernel_src=/root/work/kernel;make.  This is where it pukes
> on dlm_kernel.
>
>
> Does any one have any idea what is wrong?  Shoud I be using
> --kernel_src=/usr/src/linux-2.6.15 or --kernel_src=/root/work/kernel? In
> either case I am having the same error with either vaules for kernel_src.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060223/3b99156f/attachment.htm>

From cfeist at redhat.com  Thu Feb 23 22:26:05 2006
From: cfeist at redhat.com (Chris Feist)
Date: Thu, 23 Feb 2006 16:26:05 -0600
Subject: [Linux-cluster] problem compiling GFS with FC3...
In-Reply-To: <ee9c961f0602231221w2f2d6d60t25e305d58e3cda7e@mail.gmail.com>
References: <ee9c961f0602231221w2f2d6d60t25e305d58e3cda7e@mail.gmail.com>
Message-ID: <43FE367D.50204@redhat.com>

Screaming Eagle,

Unfortunately the STABLE branch -kernel packages are all built against the 
latest FC5 kernel, which is currently 2.6.15 plus fedora patches.  (Which is 
why it won't compile with the vanilla 2.6.15 kernel).  The easiest thing to do 
is to grab the latest FC5 kernel and build against that.

Thanks,
Chris

Screaming Eagle wrote:
> All,
> I am getting errors on dlm-kernel when compiling cluster src on FC3. 
> Here's what I did before compiling it, downloaded 2.6.15 kernel to 
> /usr/src/linux-2.6.15. compiled it with O=/root/work/kernel.
> 
> download cluster via cvs command:  cvs -d 
> :pserver:cvs at sources.redhat.com:/cvs/cluster checkout cluster
> 
> cd cluster/magma; ./configure --kernel_src=/root/work/kernel; make && 
> make install. This went find.
> cd ..
> ./configure --kernel_src=/root/work/kernel;make.  This is where it pukes 
> on dlm_kernel.
> 
> 
> Does any one have any idea what is wrong?  Shoud I be using 
> --kernel_src=/usr/src/linux-2.6.15 or --kernel_src=/root/work/kernel? In 
> either case I am having the same error with either vaules for kernel_src.
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From tekion at gmail.com  Fri Feb 24 00:41:41 2006
From: tekion at gmail.com (Screaming Eagle)
Date: Thu, 23 Feb 2006 19:41:41 -0500
Subject: [Linux-cluster] problem compiling GFS with FC3...
In-Reply-To: <43FE367D.50204@redhat.com>
References: <ee9c961f0602231221w2f2d6d60t25e305d58e3cda7e@mail.gmail.com>
	<43FE367D.50204@redhat.com>
Message-ID: <ee9c961f0602231641o7e5284fmeb6140c7f71843f@mail.gmail.com>

Do you know where I can get my hands on FC5 kernel, I tried redhat site, but
all I can find is FC3-4. Thanks.

On 2/23/06, Chris Feist <cfeist at redhat.com> wrote:
>
> Screaming Eagle,
>
> Unfortunately the STABLE branch -kernel packages are all built against the
> latest FC5 kernel, which is currently 2.6.15 plus fedora patches.  (Which
> is
> why it won't compile with the vanilla 2.6.15 kernel).  The easiest thing
> to do
> is to grab the latest FC5 kernel and build against that.
>
> Thanks,
> Chris
>
> Screaming Eagle wrote:
> > All,
> > I am getting errors on dlm-kernel when compiling cluster src on FC3.
> > Here's what I did before compiling it, downloaded 2.6.15 kernel to
> > /usr/src/linux-2.6.15. compiled it with O=/root/work/kernel.
> >
> > download cluster via cvs command:  cvs -d
> > :pserver:cvs at sources.redhat.com:/cvs/cluster checkout cluster
> >
> > cd cluster/magma; ./configure --kernel_src=/root/work/kernel; make &&
> > make install. This went find.
> > cd ..
> > ./configure --kernel_src=/root/work/kernel;make.  This is where it pukes
> > on dlm_kernel.
> >
> >
> > Does any one have any idea what is wrong?  Shoud I be using
> > --kernel_src=/usr/src/linux-2.6.15 or --kernel_src=/root/work/kernel? In
> > either case I am having the same error with either vaules for
> kernel_src.
> >
> >
> > ------------------------------------------------------------------------
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060223/07f82024/attachment.htm>

From deval.kulshrestha at progression.com  Fri Feb 24 04:55:29 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Fri, 24 Feb 2006 10:25:29 +0530
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
Message-ID: <001a01c638fe$8c7bcc90$5f00a8c0@PROGRESSION>

Hi 

I am struggling to get some help on following configuration. This setup is
intended to put live in a data center for 24 x 7 x365, any issue that makes
my environment unstable is very critical here.

 

 

My HA Cluster Setup details

 

1.	HP DL 360 G4p Server                       2nos.
2.	HP MSA 500 G2 (SAN)                     1nos.
3.	RedHat Enterprise Linux 4 ES  
4.	Red Hat Cluster Suite 4

 

Server does have a HP SCSI HBA. MSA 500G2 is a scsi based SAN. Both of these
server are connected to SAN using SCSI VHDCI cable. I used a network switch
to establish network connectivity for the server. created a disk array of
three HDD on SAN with two logical volumes than  I have installed RHEL 4
Update 1 on both server(Servers are configured with RAID 1) than installed
all HP drivers and management agents. After server configuration and OS
installation I have installed Red Hat Cluster Suite v 4 on both the machine.

 

Than I have configured Cluster using Cluster Configuration Manager. Added
member hosts, configured fence device and assigned to member host(HP iLO is
certified as an fence device), Configured Failover domain with node
priority, configured resources such as floating IP address, File System,
Script, than configured service which need to be run in HA mode.

 

After configuring this I have tested with various scenario HA is working
properly, when ever powered off any machine , services fail over on
available node. 

 

Problem:

 

If network goes off on node1, and service which were not running on node1
are started by node1 with shared storage mount point, which was already
running on node 2 but both of these nodes are not able to communicate to
each other, node2 anyway already running the same service with shared
storage mount point. Because of Fencing both of these nodes try to kill each
other. Both of they got hanged up at "Stoping Cluster manager Services.".In
/var/log/messages, it shows fencing s1, fence successful. 

If we disable fencing than 

If network comes back nodes don't synchronize with each other. Shared
storage mount point is available to both the servers. If they try to access
storage at same storage gives IO errors. Hence this entire setup become very
unstable, fragile.

 

 

 

 

 

 

With Regard

Deval

Progression Infonet Pvt. Ltd. 
55, Independent Electronic Modules, 
Sector - 18, Electronic City, 
Gurgaon - 122015

India
Tel          : - 0124 - 2455070, Ext. 215, Fax: 91-124-2398647
Mobile   : - 98186 -82509 
URL        : - www.progression.com 

 

===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060224/2082190c/attachment.htm>

From suvankar_moitra at yahoo.com  Fri Feb 24 05:15:04 2006
From: suvankar_moitra at yahoo.com (SUVANKAR MOITRA)
Date: Thu, 23 Feb 2006 21:15:04 -0800 (PST)
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
In-Reply-To: <001a01c638fe$8c7bcc90$5f00a8c0@PROGRESSION>
Message-ID: <20060224051505.98870.qmail@web52309.mail.yahoo.com>

dear deval,

I am also using the same thing .Did u install oracle
10g on active/passive cluster.If u then pl send me the
doc bcos i was unable to do the installation of oracle
10g on cluster.

regards

Suvankar Moitra
0-9830152623

--- Deval kulshrestha
<deval.kulshrestha at progression.com> wrote:

> Hi 
> 
> I am struggling to get some help on following
> configuration. This setup is
> intended to put live in a data center for 24 x 7
> x365, any issue that makes
> my environment unstable is very critical here.
> 
>  
> 
>  
> 
> My HA Cluster Setup details
> 
>  
> 
> 1.	HP DL 360 G4p Server                       2nos.
> 2.	HP MSA 500 G2 (SAN)                     1nos.
> 3.	RedHat Enterprise Linux 4 ES  
> 4.	Red Hat Cluster Suite 4
> 
>  
> 
> Server does have a HP SCSI HBA. MSA 500G2 is a scsi
> based SAN. Both of these
> server are connected to SAN using SCSI VHDCI cable.
> I used a network switch
> to establish network connectivity for the server.
> created a disk array of
> three HDD on SAN with two logical volumes than  I
> have installed RHEL 4
> Update 1 on both server(Servers are configured with
> RAID 1) than installed
> all HP drivers and management agents. After server
> configuration and OS
> installation I have installed Red Hat Cluster Suite
> v 4 on both the machine.
> 
>  
> 
> Than I have configured Cluster using Cluster
> Configuration Manager. Added
> member hosts, configured fence device and assigned
> to member host(HP iLO is
> certified as an fence device), Configured Failover
> domain with node
> priority, configured resources such as floating IP
> address, File System,
> Script, than configured service which need to be run
> in HA mode.
> 
>  
> 
> After configuring this I have tested with various
> scenario HA is working
> properly, when ever powered off any machine ,
> services fail over on
> available node. 
> 
>  
> 
> Problem:
> 
>  
> 
> If network goes off on node1, and service which were
> not running on node1
> are started by node1 with shared storage mount
> point, which was already
> running on node 2 but both of these nodes are not
> able to communicate to
> each other, node2 anyway already running the same
> service with shared
> storage mount point. Because of Fencing both of
> these nodes try to kill each
> other. Both of they got hanged up at "Stoping
> Cluster manager Services.".In
> /var/log/messages, it shows fencing s1, fence
> successful. 
> 
> If we disable fencing than 
> 
> If network comes back nodes don't synchronize with
> each other. Shared
> storage mount point is available to both the
> servers. If they try to access
> storage at same storage gives IO errors. Hence this
> entire setup become very
> unstable, fragile.
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> With Regard
> 
> Deval
> 
> Progression Infonet Pvt. Ltd. 
> 55, Independent Electronic Modules, 
> Sector - 18, Electronic City, 
> Gurgaon - 122015
> 
> India
> Tel          : - 0124 - 2455070, Ext. 215, Fax:
> 91-124-2398647
> Mobile   : - 98186 -82509 
> URL        : - www.progression.com 
> 
>  
> 
>
===========================================================
> Privileged or confidential information may be
> contained
> in this message. If you are not the addressee
> indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message
> and
> kindly notify the sender by an emailed reply.
> Opinions,
> conclusions and other information in this message
> that
> do not relate to the official business of
> Progression
> and its associate entities shall be understood as
> neither
> given nor endorsed by them.
>   
> 
>
-------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon
> (Haryana), India
> > --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From deval.kulshrestha at progression.com  Fri Feb 24 07:08:52 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Fri, 24 Feb 2006 12:38:52 +0530
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
In-Reply-To: <20060224051505.98870.qmail@web52309.mail.yahoo.com>
Message-ID: <002401c63911$2e047870$5f00a8c0@PROGRESSION>

Dear Suvankar

On my setup, I have to run Mail and Web solution with some additional
stuff(Mysql, Anti virus, Anti spam etc...).
I had never tried oracle on this.

Sorry, 
but do you have robust cluster environment. For me it's still looking very
fragile(in network perspective), if network goes-off, or at peak load when
nodes can not communicate to each other. This makes big chaos.

Regards 
Deval K.



-----Original Message-----
From: SUVANKAR MOITRA [mailto:suvankar_moitra at yahoo.com] 
Sent: Friday, February 24, 2006 10:45 AM
To: Deval kulshrestha; linux clustering
Subject: Re: [Linux-cluster] Network failure results cluster environment
unstable & fragile

dear deval,

I am also using the same thing .Did u install oracle
10g on active/passive cluster.If u then pl send me the
doc bcos i was unable to do the installation of oracle
10g on cluster.

regards

Suvankar Moitra
0-9830152623

--- Deval kulshrestha
<deval.kulshrestha at progression.com> wrote:

> Hi 
> 
> I am struggling to get some help on following
> configuration. This setup is
> intended to put live in a data center for 24 x 7
> x365, any issue that makes
> my environment unstable is very critical here.
> 
>  
> 
>  
> 
> My HA Cluster Setup details
> 
>  
> 
> 1.	HP DL 360 G4p Server                       2nos.
> 2.	HP MSA 500 G2 (SAN)                     1nos.
> 3.	RedHat Enterprise Linux 4 ES  
> 4.	Red Hat Cluster Suite 4
> 
>  
> 
> Server does have a HP SCSI HBA. MSA 500G2 is a scsi
> based SAN. Both of these
> server are connected to SAN using SCSI VHDCI cable.
> I used a network switch
> to establish network connectivity for the server.
> created a disk array of
> three HDD on SAN with two logical volumes than  I
> have installed RHEL 4
> Update 1 on both server(Servers are configured with
> RAID 1) than installed
> all HP drivers and management agents. After server
> configuration and OS
> installation I have installed Red Hat Cluster Suite
> v 4 on both the machine.
> 
>  
> 
> Than I have configured Cluster using Cluster
> Configuration Manager. Added
> member hosts, configured fence device and assigned
> to member host(HP iLO is
> certified as an fence device), Configured Failover
> domain with node
> priority, configured resources such as floating IP
> address, File System,
> Script, than configured service which need to be run
> in HA mode.
> 
>  
> 
> After configuring this I have tested with various
> scenario HA is working
> properly, when ever powered off any machine ,
> services fail over on
> available node. 
> 
>  
> 
> Problem:
> 
>  
> 
> If network goes off on node1, and service which were
> not running on node1
> are started by node1 with shared storage mount
> point, which was already
> running on node 2 but both of these nodes are not
> able to communicate to
> each other, node2 anyway already running the same
> service with shared
> storage mount point. Because of Fencing both of
> these nodes try to kill each
> other. Both of they got hanged up at "Stoping
> Cluster manager Services.".In
> /var/log/messages, it shows fencing s1, fence
> successful. 
> 
> If we disable fencing than 
> 
> If network comes back nodes don't synchronize with
> each other. Shared
> storage mount point is available to both the
> servers. If they try to access
> storage at same storage gives IO errors. Hence this
> entire setup become very
> unstable, fragile.
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> With Regard
> 
> Deval
> 
> Progression Infonet Pvt. Ltd. 
> 55, Independent Electronic Modules, 
> Sector - 18, Electronic City, 
> Gurgaon - 122015
> 
> India
> Tel          : - 0124 - 2455070, Ext. 215, Fax:
> 91-124-2398647
> Mobile   : - 98186 -82509 
> URL        : - www.progression.com 
> 
>  
> 
>
===========================================================
> Privileged or confidential information may be
> contained
> in this message. If you are not the addressee
> indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message
> and
> kindly notify the sender by an emailed reply.
> Opinions,
> conclusions and other information in this message
> that
> do not relate to the official business of
> Progression
> and its associate entities shall be understood as
> neither
> given nor endorsed by them.
>   
> 
>
-------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon
> (Haryana), India
> > --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India



From Alain.Moulle at bull.net  Fri Feb 24 07:31:16 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Fri, 24 Feb 2006 08:31:16 +0100
Subject: [Linux-cluster] CS4 / is there an option "force" ...
Message-ID: <43FEB644.8040906@bull.net>

Hi

On several other HA solutions, there is an option to
force the stop of HA software without stopping all
applications/ressources normally managed by HA.
And when starting again the HA software, it detects
that it was stopped with force option, and does
not manage the applications/ressources (as they
are always running). This option enables the
update of HA software without the need of failover
just fot that.

Is there a likewise option in CS4 ?

And if not, is it planned in future releases ?

Thanks
Alain



From tekion at gmail.com  Fri Feb 24 16:08:25 2006
From: tekion at gmail.com (Screaming Eagle)
Date: Fri, 24 Feb 2006 11:08:25 -0500
Subject: [Linux-cluster] GFS and FC3 installation help...
Message-ID: <ee9c961f0602240808o51e27a1g203b49124098f628@mail.gmail.com>

All,
I am trying to get GFS working on FC3.  So far I have visited this site,
sources.redhat.com/cluster and downloaded the stable version, compiled it
against the plain vaniall kernel 2.6.15, but getting error. I am wondering
if someone had done this with FC3 and got it work.   I am also trying
another alternative, I am downloading all of the relative  src rpm from FC4,
including the kernel. I plan to compile this in FC3. Does any know if this
will be issue regarding glibc and lib differences between FC3 and FC4?
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060224/5a15411c/attachment.htm>

From cfeist at redhat.com  Fri Feb 24 17:29:44 2006
From: cfeist at redhat.com (Chris Feist)
Date: Fri, 24 Feb 2006 11:29:44 -0600
Subject: [Linux-cluster] problem compiling GFS with FC3...
In-Reply-To: <ee9c961f0602231641o7e5284fmeb6140c7f71843f@mail.gmail.com>
References: <ee9c961f0602231221w2f2d6d60t25e305d58e3cda7e@mail.gmail.com>	
	<43FE367D.50204@redhat.com>
	<ee9c961f0602231641o7e5284fmeb6140c7f71843f@mail.gmail.com>
Message-ID: <43FF4288.5040600@redhat.com>

FC5 is in the development branch.  The latest kernel src rpm is located here:

http://download.fedora.redhat.com/pub//fedora/linux/core/development/i386/SRPMS/kernel-2.6.15-1.1977_FC5.src.rpm

Thanks,
Chris

Screaming Eagle wrote:
> Do you know where I can get my hands on FC5 kernel, I tried redhat site, 
> but all I can find is FC3-4. Thanks.
> 
> On 2/23/06, *Chris Feist* < cfeist at redhat.com 
> <mailto:cfeist at redhat.com>> wrote:
> 
>     Screaming Eagle,
> 
>     Unfortunately the STABLE branch -kernel packages are all built
>     against the
>     latest FC5 kernel, which is currently 2.6.15 plus fedora
>     patches.  (Which is
>     why it won't compile with the vanilla 2.6.15 kernel).  The easiest
>     thing to do
>     is to grab the latest FC5 kernel and build against that.
> 
>     Thanks,
>     Chris
> 
>     Screaming Eagle wrote:
>      > All,
>      > I am getting errors on dlm-kernel when compiling cluster src on FC3.
>      > Here's what I did before compiling it, downloaded 2.6.15 kernel to
>      > /usr/src/linux-2.6.15. compiled it with O=/root/work/kernel.
>      >
>      > download cluster via cvs command:  cvs -d
>      > :pserver:cvs at sources.redhat.com:/cvs/cluster checkout cluster
>      >
>      > cd cluster/magma; ./configure --kernel_src=/root/work/kernel;
>     make &&
>      > make install. This went find.
>      > cd ..
>      > ./configure --kernel_src=/root/work/kernel;make.  This is where
>     it pukes
>      > on dlm_kernel.
>      >
>      >
>      > Does any one have any idea what is wrong?  Shoud I be using
>      > --kernel_src=/usr/src/linux-2.6.15 or
>     --kernel_src=/root/work/kernel? In
>      > either case I am having the same error with either vaules for
>     kernel_src.
>      >
>      >
>      >
>     ------------------------------------------------------------------------
> 
>      >
>      > --
>      > Linux-cluster mailing list
>      > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>      > https://www.redhat.com/mailman/listinfo/linux-cluster
>     <https://www.redhat.com/mailman/listinfo/linux-cluster>
> 
> 



From cfeist at redhat.com  Fri Feb 24 17:32:04 2006
From: cfeist at redhat.com (Chris Feist)
Date: Fri, 24 Feb 2006 11:32:04 -0600
Subject: [Linux-cluster] GFS and FC3 installation help...
In-Reply-To: <ee9c961f0602240808o51e27a1g203b49124098f628@mail.gmail.com>
References: <ee9c961f0602240808o51e27a1g203b49124098f628@mail.gmail.com>
Message-ID: <43FF4314.6010903@redhat.com>

I would recommend installing FC4 and the GFS/cluster rpms that come with that. 
  (And when FC5 comes out, use that).  We never really built GFS for FC3 so we 
don't know all of the potential issues that could crop up with different 
libraries, etc.

Thanks,
Chris

Screaming Eagle wrote:
> All,
> I am trying to get GFS working on FC3.  So far I have visited this site, 
> sources.redhat.com/cluster <http://sources.redhat.com/cluster> and 
> downloaded the stable version, compiled it against the plain vaniall 
> kernel 2.6.15, but getting error. I am wondering if someone had done 
> this with FC3 and got it work.   I am also trying another alternative, I 
> am downloading all of the relative  src rpm from FC4, including the 
> kernel. I plan to compile this in FC3. Does any know if this will be 
> issue regarding glibc and lib differences between FC3 and FC4?  Thanks.
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From tekion at gmail.com  Fri Feb 24 17:45:53 2006
From: tekion at gmail.com (Screaming Eagle)
Date: Fri, 24 Feb 2006 12:45:53 -0500
Subject: [Linux-cluster] GFS and FC3 installation help...
In-Reply-To: <43FF4314.6010903@redhat.com>
References: <ee9c961f0602240808o51e27a1g203b49124098f628@mail.gmail.com>
	<43FF4314.6010903@redhat.com>
Message-ID: <ee9c961f0602240945j7291cbd2o3ca176c6ee19b36b@mail.gmail.com>

Yes. That is plan B. I got GS going with FC4 but unfortunately  most of the
servers here is FC3. Any one out there have exp. with this which they would
like to share. Thanks.

On 2/24/06, Chris Feist <cfeist at redhat.com> wrote:
>
> I would recommend installing FC4 and the GFS/cluster rpms that come with
> that.
>   (And when FC5 comes out, use that).  We never really built GFS for FC3
> so we
> don't know all of the potential issues that could crop up with different
> libraries, etc.
>
> Thanks,
> Chris
>
> Screaming Eagle wrote:
> > All,
> > I am trying to get GFS working on FC3.  So far I have visited this site,
> > sources.redhat.com/cluster <http://sources.redhat.com/cluster> and
> > downloaded the stable version, compiled it against the plain vaniall
> > kernel 2.6.15, but getting error. I am wondering if someone had done
> > this with FC3 and got it work.   I am also trying another alternative, I
> > am downloading all of the relative  src rpm from FC4, including the
> > kernel. I plan to compile this in FC3. Does any know if this will be
> > issue regarding glibc and lib differences between FC3 and FC4?  Thanks.
> >
> >
> > ------------------------------------------------------------------------
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060224/37d0ea48/attachment.htm>

From deval.kulshrestha at progression.com  Sat Feb 25 05:33:22 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Sat, 25 Feb 2006 11:03:22 +0530
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
In-Reply-To: <002401c63911$2e047870$5f00a8c0@PROGRESSION>
Message-ID: <001401c639cd$0170b3c0$6e00a8c0@PROGRESSION>

Please help me to resolve my problem 


If network goes off on node1, and service which were not running on node1
 are started by node1 with shared storage mount point, which was already
 running on node 2 but both of these nodes are not  able to communicate to
 each other, node2 anyway already running the same service with shared
 storage mount point. Because of Fencing both of  these nodes try to kill
each other. Both of they got hanged up at "Stoping Cluster manager
Services.".In /var/log/messages, it shows fencing s1, fence successful. 
 
 If we disable fencing than 
 
 If network comes back nodes don't synchronize with each other. Shared
 storage mount point is available to both the servers. If they try to access
 storage at same storage gives IO errors. Hence this entire setup become
very unstable, fragile.

--- Deval kulshrestha
<deval.kulshrestha at progression.com> wrote:

> Hi 
> 
> I am struggling to get some help on following
> configuration. This setup is
> intended to put live in a data center for 24 x 7
> x365, any issue that makes
> my environment unstable is very critical here.
> 
> My HA Cluster Setup details
> 
> 1.	HP DL 360 G4p Server                       2nos.
> 2.	HP MSA 500 G2 (SAN)                     1nos.
> 3.	RedHat Enterprise Linux 4 ES  
> 4.	Red Hat Cluster Suite 4
> 
> 
> Server does have a HP SCSI HBA. MSA 500G2 is a scsi
> based SAN. Both of these
> server are connected to SAN using SCSI VHDCI cable.
> I used a network switch
> to establish network connectivity for the server.
> created a disk array of
> three HDD on SAN with two logical volumes than  I
> have installed RHEL 4
> Update 1 on both server(Servers are configured with
> RAID 1) than installed
> all HP drivers and management agents. After server
> configuration and OS
> installation I have installed Red Hat Cluster Suite
> v 4 on both the machine.
> 
>  
> 
> Than I have configured Cluster using Cluster
> Configuration Manager. Added
> member hosts, configured fence device and assigned
> to member host(HP iLO is
> certified as an fence device), Configured Failover
> domain with node
> priority, configured resources such as floating IP
> address, File System,
> Script, than configured service which need to be run
> in HA mode.
> 
>  
> 
> After configuring this I have tested with various
> scenario HA is working
> properly, when ever powered off any machine ,
> services fail over on
> available node. 
> 
> Problem:
> 
> 
> If network goes off on node1, and service which were
> not running on node1
> are started by node1 with shared storage mount
> point, which was already
> running on node 2 but both of these nodes are not
> able to communicate to
> each other, node2 anyway already running the same
> service with shared
> storage mount point. Because of Fencing both of
> these nodes try to kill each
> other. Both of they got hanged up at "Stoping
> Cluster manager Services.".In
> /var/log/messages, it shows fencing s1, fence
> successful. 
> 
> If we disable fencing than 
> 
> If network comes back nodes don't synchronize with
> each other. Shared
> storage mount point is available to both the
> servers. If they try to access
> storage at same storage gives IO errors. Hence this
> entire setup become very
> unstable, fragile.
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> With Regard
> 
> Deval
> 
> Progression Infonet Pvt. Ltd. 
> 55, Independent Electronic Modules, 
> Sector - 18, Electronic City, 
> Gurgaon - 122015
> 
> India
> Tel          : - 0124 - 2455070, Ext. 215, Fax:
> 91-124-2398647
> Mobile   : - 98186 -82509 
> URL        : - www.progression.com 
> 
>  
> 
>
===========================================================
> Privileged or confidential information may be
> contained
> in this message. If you are not the addressee
> indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message
> and
> kindly notify the sender by an emailed reply.
> Opinions,
> conclusions and other information in this message
> that
> do not relate to the official business of
> Progression
> and its associate entities shall be understood as
> neither
> given nor endorsed by them.
>   
> 
>
-------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon
> (Haryana), India
> > --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India




From shanxiaoyu at gmail.com  Sat Feb 25 11:14:03 2006
From: shanxiaoyu at gmail.com (=?GB2312?B?z/7T7g==?=)
Date: Sat, 25 Feb 2006 19:14:03 +0800
Subject: [Linux-cluster] program hachkd is using a deprecated SCSI ioctl,
	please convert it to SG_IO
Message-ID: <bf1a6bae0602250314u6afd3707ka9d6451ce74ff7c@mail.gmail.com>

Hi,
I meet a problem when I try to configure a cluster software which named lcha
(oem  rose ha),
the service worked normal but I found there were error messages in text
mode:

*Feb 25 18:12:17 Linux02 kernel: program hachkd is using a deprecated SCSI
ioctl, please convert it to SG_IO*

what happened?
--
Family=Father & Mother I Love You!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060225/aa62f37f/attachment.htm>

From deval.kulshrestha at progression.com  Mon Feb 27 05:55:58 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Mon, 27 Feb 2006 11:25:58 +0530
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
In-Reply-To: <2353676.6811141012406600.JavaMail.root@ux-mail>
Message-ID: <001401c63b62$7e51ed50$7600a8c0@PROGRESSION>

Hi 

 

"Network goes off" - If there is very heavy traffic on network. There is no
space for cluster messages to go across all cluster members. This makes real
chaos. I am not able to figure out a method to come out of this.

 

Even on this list, except on user jeff herr, no one else have raised the
issues. 

 

An enterprise class solution can not behave like this.

 

With regards

 

Deval Kulshrestha

 

 

-----Original Message-----
From: Hirantha Wijayawardena [mailto:hirantha at vcs.informatics.lk] 
Sent: Monday, February 27, 2006 9:23 AM
To: deval.kulshrestha at progression.com; linux clustering
Subject: RE: [Linux-cluster] Network failure results cluster environment
unstable & fragile

 

Hi,


I'm not expert on Cluster. I have the same scenario as you do! but I'm not
sure what you meant "Network goes off" - assuming network cable unplugged,
network switch failed or NIC failed - I succeeded with bonding NICs and
redundant switch.


someone may advice with any other way..




Hope this is helpful



Thanks and regards


- Hirantha




----- Original Message -----

From: Deval kulshrestha  <mailto:deval.kulshrestha at progression.com>
<deval.kulshrestha at progression.com>

To: 'linux clustering'  <mailto:linux-cluster at redhat.com>
<linux-cluster at redhat.com>

Cc: 

Date: Saturday, February 25 2006 11:40 AM

Subject: RE: [Linux-cluster] Network failure results cluster environment
unstable & fragile

Please help me to resolve my problem 


If network goes off on node1, and service which were not running on node1
are started by node1 with shared storage mount point, which was already
running on node 2 but both of these nodes are not able to communicate to
each other, node2 anyway already running the same service with shared
storage mount point. Because of Fencing both of these nodes try to kill
each other. Both of they got hanged up at "Stoping Cluster manager
Services.".In /var/log/messages, it shows fencing s1, fence successful. 

If we disable fencing than 

If network comes back nodes don't synchronize with each other. Shared
storage mount point is available to both the servers. If they try to access
storage at same storage gives IO errors. Hence this entire setup become
very unstable, fragile.

--- Deval kulshrestha
<deval.kulshrestha at progression.com> wrote:

> Hi 
> 
> I am struggling to get some help on following
> configuration. This setup is
> intended to put live in a data center for 24 x 7
> x365, any issue that makes
> my environment unstable is very critical here.
> 
> My HA Cluster Setup details
> 
> 1. HP DL 360 G4p Server 2nos.
> 2. HP MSA 500 G2 (SAN) 1nos.
> 3. RedHat Enterprise Linux 4 ES 
> 4. Red Hat Cluster Suite 4
> 
> 
> Server does have a HP SCSI HBA. MSA 500G2 is a scsi
> based SAN. Both of these
> server are connected to SAN using SCSI VHDCI cable.
> I used a network switch
> to establish network connectivity for the server.
> created a disk array of
> three HDD on SAN with two logical volumes than I
> have installed RHEL 4
> Update 1 on both server(Servers are configured with
> RAID 1) than installed
> all HP drivers and management agents. After server
> configuration and OS
> installation I have installed Red Hat Cluster Suite
> v 4 on both the machine.
> 
> 
> 
> Than I have configured Cluster using Cluster
> Configuration Manager. Added
> member hosts, configured fence device and assigned
> to member host(HP iLO is
> certified as an fence device), Configured Failover
> domain with node
> priority, configured resources such as floating IP
> address, File System,
> Script, than configured service which need to be run
> in HA mode.
> 
> 
> 
> After configuring this I have tested with various
> scenario HA is working
> properly, when ever powered off any machine ,
> services fail over on
> available node. 
> 
> Problem:
> 
> 
> If network goes off on node1, and service which were
> not running on node1
> are started by node1 with shared storage mount
> point, which was already
> running on node 2 but both of these nodes are not
> able to communicate to
> each other, node2 anyway already running the same
> service with shared
> storage mount point. Because of Fencing both of
> these nodes try to kill each
> other. Both of they got hanged up at "Stoping
> Cluster manager Services.".In
> /var/log/messages, it shows fencing s1, fence
> successful. 
> 
> If we disable fencing than 
> 
> If network comes back nodes don't synchronize with
> each other. Shared
> storage mount point is available to both the
> servers. If they try to access
> storage at same storage gives IO errors. Hence this
> entire setup become very
> unstable, fragile.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> With Regard
> 
> Deval
> 
> Progression Infonet Pvt. Ltd. 
> 55, Independent Electronic Modules, 
> Sector - 18, Electronic City, 
> Gurgaon - 122015
> 
> India
> Tel : - 0124 - 2455070, Ext. 215, Fax:
> 91-124-2398647
> Mobile : - 98186 -82509 
> URL : - www.progression.com 
> 
> 
> 
>
===========================================================
> Privileged or confidential information may be
> contained
> in this message. If you are not the addressee
> indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message
> and
> kindly notify the sender by an emailed reply.
> Opinions,
> conclusions and other information in this message
> that
> do not relate to the official business of
> Progression
> and its associate entities shall be understood as
> neither
> given nor endorsed by them.
> 
> 
>
-------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon
> (Haryana), India
> > --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.


-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.


-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

 


e'switch 2.7.9 w/ MySQL on RedHat Enterprise 4 in production

 

_________________________________________________________________ 

Disclaimer and Confidentiality

 

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and delete
this e-mail from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.

===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060227/cc0fb009/attachment.htm>

From francisco_javier.pena at roche.com  Mon Feb 27 08:22:19 2006
From: francisco_javier.pena at roche.com (Pena, Francisco Javier)
Date: Mon, 27 Feb 2006 09:22:19 +0100
Subject: [Linux-cluster] Network failure results cluster
	environmentunstable & fragile
Message-ID: <C0C1791E8EC6F249B5570F01409BD3EE9161EF@rmamsem1.emea.roche.com>

Hi Deval,

If you are using iLO fencing, you could try the latest fence package
(1.32.10). I have seen a similar problem, and it is because recent iLO
firmware versions behave a little different (they try to make a soft
restart instead of a hard reboot). 

At least one of the nodes should get properly killed, and the surviving
one should keep all services.

Hope this helps. Regards,

Javier

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Deval 
> kulshrestha
> Sent: Saturday, February 25, 2006 6:33 AM
> To: 'linux clustering'
> Subject: RE: [Linux-cluster] Network failure results cluster 
> environmentunstable & fragile
> 
> 
> Please help me to resolve my problem 
> 
> 
> If network goes off on node1, and service which were not 
> running on node1  are started by node1 with shared storage 
> mount point, which was already  running on node 2 but both of 
> these nodes are not  able to communicate to  each other, 
> node2 anyway already running the same service with shared  
> storage mount point. Because of Fencing both of  these nodes 
> try to kill each other. Both of they got hanged up at 
> "Stoping Cluster manager Services.".In /var/log/messages, it 
> shows fencing s1, fence successful. 
>  
>  If we disable fencing than 
>  
>  If network comes back nodes don't synchronize with each 
> other. Shared  storage mount point is available to both the 
> servers. If they try to access  storage at same storage gives 
> IO errors. Hence this entire setup become very unstable, fragile.
> 
> --- Deval kulshrestha
> <deval.kulshrestha at progression.com> wrote:
> 
> > Hi
> > 
> > I am struggling to get some help on following
> > configuration. This setup is
> > intended to put live in a data center for 24 x 7
> > x365, any issue that makes
> > my environment unstable is very critical here.
> > 
> > My HA Cluster Setup details
> > 
> > 1.	HP DL 360 G4p Server                       2nos.
> > 2.	HP MSA 500 G2 (SAN)                     1nos.
> > 3.	RedHat Enterprise Linux 4 ES  
> > 4.	Red Hat Cluster Suite 4
> > 
> > 
> > Server does have a HP SCSI HBA. MSA 500G2 is a scsi
> > based SAN. Both of these
> > server are connected to SAN using SCSI VHDCI cable.
> > I used a network switch
> > to establish network connectivity for the server.
> > created a disk array of
> > three HDD on SAN with two logical volumes than  I
> > have installed RHEL 4
> > Update 1 on both server(Servers are configured with
> > RAID 1) than installed
> > all HP drivers and management agents. After server 
> configuration and 
> > OS installation I have installed Red Hat Cluster Suite
> > v 4 on both the machine.
> > 
> >  
> > 
> > Than I have configured Cluster using Cluster
> > Configuration Manager. Added
> > member hosts, configured fence device and assigned
> > to member host(HP iLO is
> > certified as an fence device), Configured Failover
> > domain with node
> > priority, configured resources such as floating IP
> > address, File System,
> > Script, than configured service which need to be run
> > in HA mode.
> > 
> >  
> > 
> > After configuring this I have tested with various
> > scenario HA is working
> > properly, when ever powered off any machine ,
> > services fail over on
> > available node.
> > 
> > Problem:
> > 
> > 
> > If network goes off on node1, and service which were
> > not running on node1
> > are started by node1 with shared storage mount
> > point, which was already
> > running on node 2 but both of these nodes are not
> > able to communicate to
> > each other, node2 anyway already running the same
> > service with shared
> > storage mount point. Because of Fencing both of
> > these nodes try to kill each
> > other. Both of they got hanged up at "Stoping
> > Cluster manager Services.".In
> > /var/log/messages, it shows fencing s1, fence
> > successful.
> > 
> > If we disable fencing than
> > 
> > If network comes back nodes don't synchronize with
> > each other. Shared
> > storage mount point is available to both the
> > servers. If they try to access
> > storage at same storage gives IO errors. Hence this
> > entire setup become very
> > unstable, fragile.
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > With Regard
> > 
> > Deval
> > 
> > Progression Infonet Pvt. Ltd.
> > 55, Independent Electronic Modules, 
> > Sector - 18, Electronic City, 
> > Gurgaon - 122015
> > 
> > India
> > Tel          : - 0124 - 2455070, Ext. 215, Fax:
> > 91-124-2398647
> > Mobile   : - 98186 -82509 
> > URL        : - www.progression.com 
> > 
> >  
> > 
> >
> ===========================================================
> > Privileged or confidential information may be
> > contained
> > in this message. If you are not the addressee
> > indicated
> > in this message (or responsible for delivery of the
> > message to such person), please delete this message
> > and
> > kindly notify the sender by an emailed reply.
> > Opinions,
> > conclusions and other information in this message
> > that
> > do not relate to the official business of
> > Progression
> > and its associate entities shall be understood as
> > neither
> > given nor endorsed by them.
> >   
> > 
> >
> -------------------------------------------------------------
> > Progression Infonet Private Limited, Gurgaon
> > (Haryana), India
> > > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> 
> ===========================================================
> Privileged or confidential information may be contained
> in this message. If you are not the addressee indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message and
> kindly notify the sender by an emailed reply. Opinions, 
> conclusions and other information in this message that do not 
> relate to the official business of Progression and its 
> associate entities shall be understood as neither given nor 
> endorsed by them.
>   
> 
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com 
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> ===========================================================
> Privileged or confidential information may be contained
> in this message. If you are not the addressee indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message and
> kindly notify the sender by an emailed reply. Opinions, 
> conclusions and other information in this message that do not 
> relate to the official business of Progression and its 
> associate entities shall be understood as neither given nor 
> endorsed by them.
>   
> 
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com 
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From deval.kulshrestha at progression.com  Mon Feb 27 08:37:24 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Mon, 27 Feb 2006 14:07:24 +0530
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
In-Reply-To: <C0C1791E8EC6F249B5570F01409BD3EE9161EF@rmamsem1.emea.roche.com>
Message-ID: <000801c63b79$0b652f20$7600a8c0@PROGRESSION>

Hi Javier 
Thanks for your input.
I wanted to know that, Is your issue get resolved using new fence package ?

With regard

Deval K.


-----Original Message-----
From: Pena, Francisco Javier [mailto:francisco_javier.pena at roche.com] 
Sent: Monday, February 27, 2006 1:52 PM
To: Deval kulshrestha; linux clustering
Subject: RE: [Linux-cluster] Network failure results cluster
environmentunstable & fragile

Hi Deval,

If you are using iLO fencing, you could try the latest fence package
(1.32.10). I have seen a similar problem, and it is because recent iLO
firmware versions behave a little different (they try to make a soft
restart instead of a hard reboot). 

At least one of the nodes should get properly killed, and the surviving
one should keep all services.

Hope this helps. Regards,

Javier

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Deval 
> kulshrestha
> Sent: Saturday, February 25, 2006 6:33 AM
> To: 'linux clustering'
> Subject: RE: [Linux-cluster] Network failure results cluster 
> environmentunstable & fragile
> 
> 
> Please help me to resolve my problem 
> 
> 
> If network goes off on node1, and service which were not 
> running on node1  are started by node1 with shared storage 
> mount point, which was already  running on node 2 but both of 
> these nodes are not  able to communicate to  each other, 
> node2 anyway already running the same service with shared  
> storage mount point. Because of Fencing both of  these nodes 
> try to kill each other. Both of they got hanged up at 
> "Stoping Cluster manager Services.".In /var/log/messages, it 
> shows fencing s1, fence successful. 
>  
>  If we disable fencing than 
>  
>  If network comes back nodes don't synchronize with each 
> other. Shared  storage mount point is available to both the 
> servers. If they try to access  storage at same storage gives 
> IO errors. Hence this entire setup become very unstable, fragile.
> 
> --- Deval kulshrestha
> <deval.kulshrestha at progression.com> wrote:
> 
> > Hi
> > 
> > I am struggling to get some help on following
> > configuration. This setup is
> > intended to put live in a data center for 24 x 7
> > x365, any issue that makes
> > my environment unstable is very critical here.
> > 
> > My HA Cluster Setup details
> > 
> > 1.	HP DL 360 G4p Server                       2nos.
> > 2.	HP MSA 500 G2 (SAN)                     1nos.
> > 3.	RedHat Enterprise Linux 4 ES  
> > 4.	Red Hat Cluster Suite 4
> > 
> > 
> > Server does have a HP SCSI HBA. MSA 500G2 is a scsi
> > based SAN. Both of these
> > server are connected to SAN using SCSI VHDCI cable.
> > I used a network switch
> > to establish network connectivity for the server.
> > created a disk array of
> > three HDD on SAN with two logical volumes than  I
> > have installed RHEL 4
> > Update 1 on both server(Servers are configured with
> > RAID 1) than installed
> > all HP drivers and management agents. After server 
> configuration and 
> > OS installation I have installed Red Hat Cluster Suite
> > v 4 on both the machine.
> > 
> >  
> > 
> > Than I have configured Cluster using Cluster
> > Configuration Manager. Added
> > member hosts, configured fence device and assigned
> > to member host(HP iLO is
> > certified as an fence device), Configured Failover
> > domain with node
> > priority, configured resources such as floating IP
> > address, File System,
> > Script, than configured service which need to be run
> > in HA mode.
> > 
> >  
> > 
> > After configuring this I have tested with various
> > scenario HA is working
> > properly, when ever powered off any machine ,
> > services fail over on
> > available node.
> > 
> > Problem:
> > 
> > 
> > If network goes off on node1, and service which were
> > not running on node1
> > are started by node1 with shared storage mount
> > point, which was already
> > running on node 2 but both of these nodes are not
> > able to communicate to
> > each other, node2 anyway already running the same
> > service with shared
> > storage mount point. Because of Fencing both of
> > these nodes try to kill each
> > other. Both of they got hanged up at "Stoping
> > Cluster manager Services.".In
> > /var/log/messages, it shows fencing s1, fence
> > successful.
> > 
> > If we disable fencing than
> > 
> > If network comes back nodes don't synchronize with
> > each other. Shared
> > storage mount point is available to both the
> > servers. If they try to access
> > storage at same storage gives IO errors. Hence this
> > entire setup become very
> > unstable, fragile.
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > With Regard
> > 
> > Deval
> > 
> > Progression Infonet Pvt. Ltd.
> > 55, Independent Electronic Modules, 
> > Sector - 18, Electronic City, 
> > Gurgaon - 122015
> > 
> > India
> > Tel          : - 0124 - 2455070, Ext. 215, Fax:
> > 91-124-2398647
> > Mobile   : - 98186 -82509 
> > URL        : - www.progression.com 
> > 
> >  
> > 
> >
> ===========================================================
> > Privileged or confidential information may be
> > contained
> > in this message. If you are not the addressee
> > indicated
> > in this message (or responsible for delivery of the
> > message to such person), please delete this message
> > and
> > kindly notify the sender by an emailed reply.
> > Opinions,
> > conclusions and other information in this message
> > that
> > do not relate to the official business of
> > Progression
> > and its associate entities shall be understood as
> > neither
> > given nor endorsed by them.
> >   
> > 
> >
> -------------------------------------------------------------
> > Progression Infonet Private Limited, Gurgaon
> > (Haryana), India
> > > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> 
> ===========================================================
> Privileged or confidential information may be contained
> in this message. If you are not the addressee indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message and
> kindly notify the sender by an emailed reply. Opinions, 
> conclusions and other information in this message that do not 
> relate to the official business of Progression and its 
> associate entities shall be understood as neither given nor 
> endorsed by them.
>   
> 
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com 
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> ===========================================================
> Privileged or confidential information may be contained
> in this message. If you are not the addressee indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message and
> kindly notify the sender by an emailed reply. Opinions, 
> conclusions and other information in this message that do not 
> relate to the official business of Progression and its 
> associate entities shall be understood as neither given nor 
> endorsed by them.
>   
> 
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com 
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India



From sgray at bluestarinc.com  Mon Feb 27 16:59:52 2006
From: sgray at bluestarinc.com (Sean Gray)
Date: Mon, 27 Feb 2006 11:59:52 -0500
Subject: [Linux-cluster] dlm_sendd
Message-ID: <44033008.9010208@bluestarinc.com>

Has anyone seen high load averages caused by dlm_sendd? If so what is 
the resolution?

I am trying to run Oracle apps and db in a two server config. I can 
provide more detail if needed.

TIA

-- 
Sean N. Gray
Director of Information Technology
United Radio Incorporated, DBA BlueStar
24 Spiral Drive
Florence, Kentucky 41042
office: 859.371.4423 x3263
toll free: 800.371.4423 x3263
fax: 859.371.4425
mobile: 513.616.3379




From teigland at redhat.com  Mon Feb 27 17:15:15 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 27 Feb 2006 11:15:15 -0600
Subject: [Linux-cluster] Does 2.6.15 kernel include...
In-Reply-To: <ee9c961f0602221707l5aaa8c3an7e802f8f72c160b4@mail.gmail.com>
References: <ee9c961f0602221707l5aaa8c3an7e802f8f72c160b4@mail.gmail.com>
Message-ID: <20060227171515.GC25004@redhat.com>

On Wed, Feb 22, 2006 at 08:07:51PM -0500, Screaming Eagle wrote:
> Does any one know if 2.6.15 kernel includes GFS? If not, then were can I get
> the patch for it? Thanks.

This week I'm hoping to make another release from the STABLE branch of cvs
that will build on 2.6.15.

Dave



From binky123 at excite.com  Mon Feb 27 19:54:24 2006
From: binky123 at excite.com (binky123 at excite.com)
Date: Mon, 27 Feb 2006 14:54:24 -0500 (EST)
Subject: [Linux-cluster] GNBD + LVM on FC4
Message-ID: <20060227195424.141F4BF6D@xprdmailfe13.nwk.excite.com>


I'd like some opinions on how to coalesce 6 partitions of 2TB each
over 3 PCs.  So the total filesystem size is about 12TB.  I tried
using gnbd to export from 2 servers and used LVM on the master to
create the 12TB logical volume out of 2 local partitions and 4 gnbd
partitions.  mkfs -t xfs got stuck waiting for
Disk I/O.  I tried several other variations and couldn't make it
work.  I posted the results on the LVM mailing list but wanted to
get other suggestions on what other software can be used. Thanks.




_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!




From devrim at gunduz.org  Mon Feb 27 23:11:03 2006
From: devrim at gunduz.org (Devrim GUNDUZ)
Date: Tue, 28 Feb 2006 01:11:03 +0200 (EET)
Subject: [Linux-cluster] GFS: "transport endpoint is not connected" error
Message-ID: <Pine.LNX.4.63.0602280049230.11714@mail.kivi.com.tr>


Greetings,

I've installed GFS to all members in a a RHEL AS/ES system. This cluster 
is connected to a SAN.

Using one of the nodes, I mkfs'ed all partitions on SAN with GFS. When I 
want to mount those partitions, I get the error on subject. I google'd, 
however I could not find an exact solution to my problem. There are 9 
servers, 2+2 will run as a cluster, and 5 of them will work in standalone 
mode. Neither the standalone, nor the cluster members can mount the 
partitions.

I have another question: The first 2-node cluster needs 4 TB of data. 
However, IBM reported us that we can only create 2 TB LUNs. So, I will 
need to create LVM and combine two partitions, right? Any guides/comments 
on that?

Regards,
--
Devrim GUNDUZ
Kivi Bili?im Teknolojileri         -          http://www.kivi.com.tr
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
                       http://www.gunduz.org

From lhh at redhat.com  Tue Feb 28 01:17:26 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 27 Feb 2006 20:17:26 -0500
Subject: [Linux-cluster] oracle 10g installation on RHCS-4
In-Reply-To: <20060221125711.86256.qmail@web52302.mail.yahoo.com>
References: <20060221125711.86256.qmail@web52302.mail.yahoo.com>
Message-ID: <1141089446.13130.55.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-21 at 04:57 -0800, SUVANKAR MOITRA wrote:
> dear sir,
> 
> I want to install oracle 10g on Rhcs-4 cluster
> active/passive.Pl send me a document for the above
> installation .
> 
> Pl help me its very very urgent.
> 
> regards

First draft here:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182423

-- Lon




From lhh at redhat.com  Tue Feb 28 01:18:17 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 27 Feb 2006 20:18:17 -0500
Subject: [Linux-cluster] CS4/ "Monitoring the health of remote power
	switch" ?
In-Reply-To: <43FD67C3.9060606@bull.net>
References: <43FD67C3.9060606@bull.net>
Message-ID: <1141089497.13130.57.camel@ayanami.boston.redhat.com>

On Thu, 2006-02-23 at 08:44 +0100, Alain Moulle wrote:
> Hi
> 
> In the CS4 Red Hat Documentation chapter
> "Red Hat Cluster Manager Overview" , it
> is written :
> "To monitor the health of the other nodes,
> each node monitors the health of remote
> power switch if any ,etc."

It actually doesn't monitor the power switches in RHCS4, but it does
monitor the other members.

-- Lon




From lhh at redhat.com  Tue Feb 28 01:19:57 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 27 Feb 2006 20:19:57 -0500
Subject: [Linux-cluster] Ordering of NFS exports
In-Reply-To: <43FD8BEF.4030104@ift.uib.no>
References: <43FD8BEF.4030104@ift.uib.no>
Message-ID: <1141089597.13130.60.camel@ayanami.boston.redhat.com>

On Thu, 2006-02-23 at 11:18 +0100, Birger Wathne wrote:
> I have a problem with my cluster that pops up now and then.
> 
> Sometimes the ordering of my NFS exports changes. This quickly becomes a 
> problem, as some hosts need remote root access to file systems. When 
> there is an earlier export matching the client that doesn't allow root 
> access things start breaking here...
> 
> I would propose that the logic for exporting file systems should 
> preserve ordering of exports for each file system. Perhaps by redoing 
> all exports for a file system in correct order if any of the exports for 
> that file system need to be renewed?

Sounds reasonable -- could you file a bugzilla?  

-- Lon



From lhh at redhat.com  Tue Feb 28 01:21:22 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 27 Feb 2006 20:21:22 -0500
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
In-Reply-To: <001a01c638fe$8c7bcc90$5f00a8c0@PROGRESSION>
References: <001a01c638fe$8c7bcc90$5f00a8c0@PROGRESSION>
Message-ID: <1141089682.13130.62.camel@ayanami.boston.redhat.com>

On Fri, 2006-02-24 at 10:25 +0530, Deval kulshrestha wrote:

> If network goes off on node1, and service which were not running on
> node1 are started by node1 with shared storage mount point, which was
> already running on node 2 but both of these nodes are not able to
> communicate to each other, node2 anyway already running the same
> service with shared storage mount point. Because of Fencing both of
> these nodes try to kill each other. Both of they got hanged up at
> ?Stoping Cluster manager Services??.In /var/log/messages, it shows
> fencing s1, fence successful. 

What are you using as fencing?

-- Lon




From lhh at redhat.com  Tue Feb 28 01:25:58 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 27 Feb 2006 20:25:58 -0500
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
In-Reply-To: <002401c63911$2e047870$5f00a8c0@PROGRESSION>
References: <002401c63911$2e047870$5f00a8c0@PROGRESSION>
Message-ID: <1141089958.13130.67.camel@ayanami.boston.redhat.com>

On Fri, 2006-02-24 at 12:38 +0530, Deval kulshrestha wrote:
> Dear Suvankar
> 
> On my setup, I have to run Mail and Web solution with some additional
> stuff(Mysql, Anti virus, Anti spam etc...).
> I had never tried oracle on this.
> 
> Sorry, 
> but do you have robust cluster environment. For me it's still looking very
> fragile(in network perspective), if network goes-off, or at peak load when
> nodes can not communicate to each other. This makes big chaos.

You need to play with the timers in /proc/cluster

  /proc/cluster/config/cman/hello_timer
  /proc/cluster/config/cman/deadnode_timeout

Increase the value of deadnode_timeout by 20-40 seconds on each of the
cluster nodes, and it should be more resilient to load increases.

-- Lon



From lhh at redhat.com  Tue Feb 28 01:26:43 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 27 Feb 2006 20:26:43 -0500
Subject: [Linux-cluster] CS4 / is there an option "force" ...
In-Reply-To: <43FEB644.8040906@bull.net>
References: <43FEB644.8040906@bull.net>
Message-ID: <1141090003.13130.69.camel@ayanami.boston.redhat.com>

On Fri, 2006-02-24 at 08:31 +0100, Alain Moulle wrote:
> Hi
> 
> On several other HA solutions, there is an option to
> force the stop of HA software without stopping all
> applications/ressources normally managed by HA.
> And when starting again the HA software, it detects
> that it was stopped with force option, and does
> not manage the applications/ressources (as they
> are always running). This option enables the
> update of HA software without the need of failover
> just fot that.
> 
> Is there a likewise option in CS4 ?

Nope.

> And if not, is it planned in future releases ?

Not at the moment.

-- Lon




From lhh at redhat.com  Tue Feb 28 01:28:16 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 27 Feb 2006 20:28:16 -0500
Subject: [Linux-cluster] Network failure results cluster
	environmentunstable & fragile
In-Reply-To: <C0C1791E8EC6F249B5570F01409BD3EE9161EF@rmamsem1.emea.roche.com>
References: <C0C1791E8EC6F249B5570F01409BD3EE9161EF@rmamsem1.emea.roche.com>
Message-ID: <1141090096.13130.72.camel@ayanami.boston.redhat.com>

On Mon, 2006-02-27 at 09:22 +0100, Pena, Francisco Javier wrote:
> Hi Deval,
> 
> If you are using iLO fencing, you could try the latest fence package
> (1.32.10). I have seen a similar problem, and it is because recent iLO
> firmware versions behave a little different (they try to make a soft
> restart instead of a hard reboot). 

Also, if you're using iLO fencing, make sure you boot with acpi=off
(see /etc/grub.conf) and have acpid stopped.  (e.g. chkconfig --level
2345 acpid off)

ACPI soft-poweroff (which iLO tries to do) is exactly what you do not
want in a cluster.

-- Lon



From hlawatschek at atix.de  Tue Feb 28 09:14:15 2006
From: hlawatschek at atix.de (Mark Hlawatschek)
Date: Tue, 28 Feb 2006 10:14:15 +0100
Subject: [Linux-cluster] GFS: "transport endpoint is not connected" error
In-Reply-To: <Pine.LNX.4.63.0602280049230.11714@mail.kivi.com.tr>
References: <Pine.LNX.4.63.0602280049230.11714@mail.kivi.com.tr>
Message-ID: <200602281014.15703.hlawatschek@atix.de>

Hi,

have you already started your cluster environment (CMAN, fenced, DLM/Gulm) ?
What are the exact steps you have done ?

Mark  

On Tuesday 28 February 2006 00:11, Devrim GUNDUZ wrote:
> Greetings,
>
> I've installed GFS to all members in a a RHEL AS/ES system. This cluster
> is connected to a SAN.
>
> Using one of the nodes, I mkfs'ed all partitions on SAN with GFS. When I
> want to mount those partitions, I get the error on subject. I google'd,
> however I could not find an exact solution to my problem. There are 9
> servers, 2+2 will run as a cluster, and 5 of them will work in standalone
> mode. Neither the standalone, nor the cluster members can mount the
> partitions.
>
> I have another question: The first 2-node cluster needs 4 TB of data.
> However, IBM reported us that we can only create 2 TB LUNs. So, I will
> need to create LVM and combine two partitions, right? Any guides/comments
> on that?
>
> Regards,
> --
> Devrim GUNDUZ
> Kivi Bili?im Teknolojileri         -          http://www.kivi.com.tr
> devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
>                        http://www.gunduz.org

-- 
Gruss / Regards,

Dipl.-Ing. Mark Hlawatschek
Phone: +49-89 121 409-55
http://www.atix.de/
http://www.open-sharedroot.org/

**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany




From brilong at cisco.com  Tue Feb 28 13:46:13 2006
From: brilong at cisco.com (Brian Long)
Date: Tue, 28 Feb 2006 08:46:13 -0500
Subject: [Linux-cluster] oracle 10g installation on RHCS-4
In-Reply-To: <1141089446.13130.55.camel@ayanami.boston.redhat.com>
References: <20060221125711.86256.qmail@web52302.mail.yahoo.com>
	<1141089446.13130.55.camel@ayanami.boston.redhat.com>
Message-ID: <1141134374.4433.14.camel@brilong-lnx>

On Mon, 2006-02-27 at 20:17 -0500, Lon Hohberger wrote:
> On Tue, 2006-02-21 at 04:57 -0800, SUVANKAR MOITRA wrote:
> > dear sir,
> > 
> > I want to install oracle 10g on Rhcs-4 cluster
> > active/passive.Pl send me a document for the above
> > installation .
> > 
> > Pl help me its very very urgent.
> > 
> > regards
> 
> First draft here:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182423

Being fairly new to Oracle clustering (failover or RAC), I have a dumb
question.  I just read documentation from Oracle about their Clusterware
product which is part of the 10g software.  A RAC license is not
required to use Clusterware.  Why would you deploy 10g failover on RHCS
instead of Clusterware?  Would you do this only if you've already got
RHCS installed on a few nodes and you're adding 10g to the config?

/Brian/
-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From brilong at cisco.com  Tue Feb 28 13:50:29 2006
From: brilong at cisco.com (Brian Long)
Date: Tue, 28 Feb 2006 08:50:29 -0500
Subject: [Linux-cluster] Network failure results cluster
	environmentunstable & fragile
In-Reply-To: <1141090096.13130.72.camel@ayanami.boston.redhat.com>
References: <C0C1791E8EC6F249B5570F01409BD3EE9161EF@rmamsem1.emea.roche.com>
	<1141090096.13130.72.camel@ayanami.boston.redhat.com>
Message-ID: <1141134630.4433.18.camel@brilong-lnx>

On Mon, 2006-02-27 at 20:28 -0500, Lon Hohberger wrote:
> On Mon, 2006-02-27 at 09:22 +0100, Pena, Francisco Javier wrote:
> > Hi Deval,
> > 
> > If you are using iLO fencing, you could try the latest fence package
> > (1.32.10). I have seen a similar problem, and it is because recent iLO
> > firmware versions behave a little different (they try to make a soft
> > restart instead of a hard reboot). 
> 
> Also, if you're using iLO fencing, make sure you boot with acpi=off
> (see /etc/grub.conf) and have acpid stopped.  (e.g. chkconfig --level
> 2345 acpid off)
> 
> ACPI soft-poweroff (which iLO tries to do) is exactly what you do not
> want in a cluster.

Wow, this sounds fairly important, but I do not see this mentioned in
the RHCS-4 "Configuring and Managing a Cluster" documentation when I
look at the setup for ILO fencing.  Where is this documented so others
deploying HP Proliants don't fall into a trap?  :)

/Brian/

-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From francisco_javier.pena at roche.com  Tue Feb 28 14:03:17 2006
From: francisco_javier.pena at roche.com (Pena, Francisco Javier)
Date: Tue, 28 Feb 2006 15:03:17 +0100
Subject: [Linux-cluster] Network failure results
	clusterenvironmentunstable & fragile
Message-ID: <C0C1791E8EC6F249B5570F01409BD3EE9161F7@rmamsem1.emea.roche.com>


> On Mon, 2006-02-27 at 20:28 -0500, Lon Hohberger wrote:
> > On Mon, 2006-02-27 at 09:22 +0100, Pena, Francisco Javier wrote:
> > > Hi Deval,
> > > 
> > > If you are using iLO fencing, you could try the latest 
> fence package 
> > > (1.32.10). I have seen a similar problem, and it is 
> because recent 
> > > iLO firmware versions behave a little different (they try 
> to make a 
> > > soft restart instead of a hard reboot).
> > 
> > Also, if you're using iLO fencing, make sure you boot with acpi=off 
> > (see /etc/grub.conf) and have acpid stopped.  (e.g. 
> chkconfig --level 
> > 2345 acpid off)
> > 
> > ACPI soft-poweroff (which iLO tries to do) is exactly what 
> you do not 
> > want in a cluster.
> 
> Wow, this sounds fairly important, but I do not see this 
> mentioned in the RHCS-4 "Configuring and Managing a Cluster" 
> documentation when I look at the setup for ILO fencing.  
> Where is this documented so others deploying HP Proliants 
> don't fall into a trap?  :)
> 
> /Brian/
> 

Well, as far as I have been able to check, this is not necessary if you
are 
running the latest version of the fence package. Previous versions with
recent
iLO firmware (1.75+) caused this ACPI soft-poweroff issue (they were
simulating
a power button press, instead of a "press and hold for a few seconds").

I would just be happy if they added the perl-Crypt-SSLeay package as a 
Dependency to the fence package. If it is not installed, you may have a
nice
surprise when trying to fence off a node using the iLO :).

Javier



From lgodoy at atichile.com  Tue Feb 28 14:25:27 2006
From: lgodoy at atichile.com (Luis Godoy Gonzalez)
Date: Tue, 28 Feb 2006 11:25:27 -0300
Subject: [Linux-cluster] Link Status
Message-ID: <44045D57.90304@atichile.com>

Hi

I have installed RH Cluster Suite 4 U2.

My problem is with Link status for IP associated to one service.
If I loose connectivity with my router  the service restarts every 2 
minutes.

I have a similar problem with other Cluster ( Sun Cluster ), in this 
case I configured 3 default router.
But in Linux  it is not clear  for me if  I can  do the same ( over one 
interface ).

Is it possible to configure a pool of alternate IP addresses for Link 
status checking??
Is it possible to configure multiple default routers on one interface ??

The service in the cluster are acesible primarily from local network, 
the router access is not prority for service operation.
In this moment I disabled link status check, but I'am wish found a 
better solution.

Any help will be appreciated.
Thanks in advance

Luis G.



From baesso at ksolutions.it  Tue Feb 28 14:55:57 2006
From: baesso at ksolutions.it (Baesso Mirko)
Date: Tue, 28 Feb 2006 15:55:57 +0100
Subject: R: [Linux-cluster] oracle 10g installation on RHCS-4
Message-ID: <984C9DBB29704B47B7AAD308F2C95A3B04DDF6@kmail.ksolutions.it>

Sorry, can i use same setup for an active/active environment?
thanks

Baesso Mirko - System Engineer
KSolutions.S.p.A.
Via Lenin 132/26
56017  S.Martino Ulmiano (PI) - Italy
tel.+ 39 0 50 898369 fax. + 39 0 50 861200
baesso at ksolutions.it   http//www.ksolutions.it

-----Messaggio originale-----
Da: Lon Hohberger [mailto:lhh at redhat.com] 
Inviato: marted? 28 febbraio 2006 2.17
A: linux clustering
Oggetto: Re: [Linux-cluster] oracle 10g installation on RHCS-4

On Tue, 2006-02-21 at 04:57 -0800, SUVANKAR MOITRA wrote:
> dear sir,
> 
> I want to install oracle 10g on Rhcs-4 cluster
> active/passive.Pl send me a document for the above
> installation .
> 
> Pl help me its very very urgent.
> 
> regards

First draft here:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182423

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster






From gforte at leopard.us.udel.edu  Tue Feb 28 15:03:25 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Tue, 28 Feb 2006 10:03:25 -0500
Subject: R: [Linux-cluster] oracle 10g installation on RHCS-4
In-Reply-To: <984C9DBB29704B47B7AAD308F2C95A3B04DDF6@kmail.ksolutions.it>
References: <984C9DBB29704B47B7AAD308F2C95A3B04DDF6@kmail.ksolutions.it>
Message-ID: <4404663D.9@leopard.us.udel.edu>

no.

Baesso Mirko wrote:
> Sorry, can i use same setup for an active/active environment?
> thanks
> 
> Baesso Mirko - System Engineer
> KSolutions.S.p.A.
> Via Lenin 132/26
> 56017  S.Martino Ulmiano (PI) - Italy
> tel.+ 39 0 50 898369 fax. + 39 0 50 861200
> baesso at ksolutions.it   http//www.ksolutions.it
> 
> -----Messaggio originale-----
> Da: Lon Hohberger [mailto:lhh at redhat.com] 
> Inviato: marted? 28 febbraio 2006 2.17
> A: linux clustering
> Oggetto: Re: [Linux-cluster] oracle 10g installation on RHCS-4
> 
> On Tue, 2006-02-21 at 04:57 -0800, SUVANKAR MOITRA wrote:
>> dear sir,
>>
>> I want to install oracle 10g on Rhcs-4 cluster
>> active/passive.Pl send me a document for the above
>> installation .
>>
>> Pl help me its very very urgent.
>>
>> regards
> 
> First draft here:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182423
> 
> -- Lon
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE



From baesso at ksolutions.it  Tue Feb 28 15:09:04 2006
From: baesso at ksolutions.it (Baesso Mirko)
Date: Tue, 28 Feb 2006 16:09:04 +0100
Subject: R: R: [Linux-cluster] oracle 10g installation on RHCS-4
Message-ID: <984C9DBB29704B47B7AAD308F2C95A3B04DDF7@kmail.ksolutions.it>

could you please send me some useful documents providing this setup
Thanks

Baesso Mirko - System Engineer
KSolutions.S.p.A.
Via Lenin 132/26
56017  S.Martino Ulmiano (PI) - Italy
tel.+ 39 0 50 898369 fax. + 39 0 50 861200
baesso at ksolutions.it   http//www.ksolutions.it


-----Messaggio originale-----
Da: Greg Forte [mailto:gforte at leopard.us.udel.edu] 
Inviato: marted? 28 febbraio 2006 16.03
A: linux clustering
Oggetto: Re: R: [Linux-cluster] oracle 10g installation on RHCS-4

no.

Baesso Mirko wrote:
> Sorry, can i use same setup for an active/active environment?
> thanks
> 
> Baesso Mirko - System Engineer
> KSolutions.S.p.A.
> Via Lenin 132/26
> 56017  S.Martino Ulmiano (PI) - Italy
> tel.+ 39 0 50 898369 fax. + 39 0 50 861200
> baesso at ksolutions.it   http//www.ksolutions.it
> 
> -----Messaggio originale-----
> Da: Lon Hohberger [mailto:lhh at redhat.com] 
> Inviato: marted? 28 febbraio 2006 2.17
> A: linux clustering
> Oggetto: Re: [Linux-cluster] oracle 10g installation on RHCS-4
> 
> On Tue, 2006-02-21 at 04:57 -0800, SUVANKAR MOITRA wrote:
>> dear sir,
>>
>> I want to install oracle 10g on Rhcs-4 cluster
>> active/passive.Pl send me a document for the above
>> installation .
>>
>> Pl help me its very very urgent.
>>
>> regards
> 
> First draft here:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182423
> 
> -- Lon
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster






From e.tano at palazzochigi.it  Tue Feb 28 15:09:20 2006
From: e.tano at palazzochigi.it (Tano Enzo)
Date: Tue, 28 Feb 2006 16:09:20 +0100
Subject: [Linux-cluster] Patch for fence dell era
Message-ID: <6C5FB8EA05488B44B62544668AB4EE9C1BD7AC@PCM-EXCH-VIRT2.pcm.it>


Hi,

I have created a patch for revision 1.3.2.2 of fence_drac.pl. My patch
add support for device ERA, I have tested with PowerEdge 2650 and Linux
RedHAT EL 4 Update 2.
Can I submit to official release? I can apply this patch also others
release.

Enzo



_______________________________________________________
Messaggio analizzato e protetto da tecnologia antivirus

Servizio erogato dal sistema informativo della
Presidenza del Consiglio dei Ministri
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_add_era
Type: application/octet-stream
Size: 1587 bytes
Desc: patch_add_era
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060228/8b6c704f/attachment.obj>

From brilong at cisco.com  Tue Feb 28 15:35:23 2006
From: brilong at cisco.com (Brian Long)
Date: Tue, 28 Feb 2006 10:35:23 -0500
Subject: R: R: [Linux-cluster] oracle 10g installation on RHCS-4
In-Reply-To: <984C9DBB29704B47B7AAD308F2C95A3B04DDF7@kmail.ksolutions.it>
References: <984C9DBB29704B47B7AAD308F2C95A3B04DDF7@kmail.ksolutions.it>
Message-ID: <1141140923.4433.33.camel@brilong-lnx>

On Tue, 2006-02-28 at 16:09 +0100, Baesso Mirko wrote:
> could you please send me some useful documents providing this setup
> Thanks

Active/Active requires Oracle RAC licenses and you can use either NAS or
GFS or OCFS for storage.  Only NAS and GFS are supported by Red Hat, but
GFS is only certified on RHEL 3 today.

/Brian/

-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s



From lhh at redhat.com  Tue Feb 28 19:29:10 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 28 Feb 2006 14:29:10 -0500
Subject: R: [Linux-cluster] oracle 10g installation on RHCS-4
In-Reply-To: <984C9DBB29704B47B7AAD308F2C95A3B04DDF6@kmail.ksolutions.it>
References: <984C9DBB29704B47B7AAD308F2C95A3B04DDF6@kmail.ksolutions.it>
Message-ID: <1141154950.13130.94.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-28 at 15:55 +0100, Baesso Mirko wrote:
> Sorry, can i use same setup for an active/active environment?
> thanks

Nope, that was all active/passive.  Ext3 file system.  Haven't tried
RAC, but I know that RAC runs on top of GFS.

-- Lon




From lhh at redhat.com  Tue Feb 28 19:29:58 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 28 Feb 2006 14:29:58 -0500
Subject: R: R: [Linux-cluster] oracle 10g installation on RHCS-4
In-Reply-To: <984C9DBB29704B47B7AAD308F2C95A3B04DDF7@kmail.ksolutions.it>
References: <984C9DBB29704B47B7AAD308F2C95A3B04DDF7@kmail.ksolutions.it>
Message-ID: <1141154998.13130.96.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-28 at 16:09 +0100, Baesso Mirko wrote:
> could you please send me some useful documents providing this setup

There's nothing special.  Following the RAC installation guide and
installing on top of GFS should "just work".

-- Lon




From lhh at redhat.com  Tue Feb 28 19:31:39 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 28 Feb 2006 14:31:39 -0500
Subject: [Linux-cluster] Link Status
In-Reply-To: <44045D57.90304@atichile.com>
References: <44045D57.90304@atichile.com>
Message-ID: <1141155099.13130.99.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-28 at 11:25 -0300, Luis Godoy Gonzalez wrote:
> Hi
> 
> I have installed RH Cluster Suite 4 U2.


> My problem is with Link status for IP associated to one service.
> If I loose connectivity with my router  the service restarts every 2 
> minutes.

Router-checking has been removed in U3 beta, I believe.  It ended up
causing more problems than it actually solved.

-- Lon



From lhh at redhat.com  Tue Feb 28 19:32:17 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 28 Feb 2006 14:32:17 -0500
Subject: R: R: [Linux-cluster] oracle 10g installation on RHCS-4
In-Reply-To: <1141154998.13130.96.camel@ayanami.boston.redhat.com>
References: <984C9DBB29704B47B7AAD308F2C95A3B04DDF7@kmail.ksolutions.it>
	<1141154998.13130.96.camel@ayanami.boston.redhat.com>
Message-ID: <1141155137.13130.100.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-28 at 14:29 -0500, Lon Hohberger wrote:
> On Tue, 2006-02-28 at 16:09 +0100, Baesso Mirko wrote:
> > could you please send me some useful documents providing this setup
> 
> There's nothing special.  Following the RAC installation guide and
> installing on top of GFS should "just work".

Though it might not be certified, as Brian stated.

-- Lon



From jerome.fernandes at gmail.com  Tue Feb 28 19:51:07 2006
From: jerome.fernandes at gmail.com (Jerome Fernandes)
Date: Tue, 28 Feb 2006 14:51:07 -0500
Subject: [Linux-cluster] Resource Info via clustat
Message-ID: <6f06b93d0602281151h1107c76an7ecafc4ece4007ca@mail.gmail.com>

Hi,

On RH4, "clustat -x" will give me the Service/Group info, but not the
"Resource" info that is defined in each Group. Is there any other method to
retrieve this information besides just viewing the /etc/cluster/cluster.conf
file? Any help would be appreciated.

Thanks,
Jerome.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060228/32f0d1da/attachment.htm>

From lhh at redhat.com  Tue Feb 28 20:04:41 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 28 Feb 2006 15:04:41 -0500
Subject: [Linux-cluster] oracle 10g installation on RHCS-4
In-Reply-To: <1141134374.4433.14.camel@brilong-lnx>
References: <20060221125711.86256.qmail@web52302.mail.yahoo.com>
	<1141089446.13130.55.camel@ayanami.boston.redhat.com>
	<1141134374.4433.14.camel@brilong-lnx>
Message-ID: <1141157081.13130.126.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-28 at 08:46 -0500, Brian Long wrote:

> Being fairly new to Oracle clustering (failover or RAC), I have a dumb
> question.  I just read documentation from Oracle about their Clusterware
> product which is part of the 10g software.  A RAC license is not
> required to use Clusterware.  Why would you deploy 10g failover on RHCS
> instead of Clusterware?  Would you do this only if you've already got
> RHCS installed on a few nodes and you're adding 10g to the config?

That's a key reason.  There are advantages and disadvantages to both
approaches.  It comes down to preferences, mostly.

Using linux-cluster (aka RHCS, in this case)...

- You do not need to provision raw devices for membership (or anything
at all, actually)
- You get the benefits of fencing (for those who are paranoid ;) )
- You can use ext3 as the back-end store for any and all parts of the
database installation.

Obviously, whether any of the above is an advantage to you is a matter
of preference too ;)

-- Lon



From lhh at redhat.com  Tue Feb 28 21:15:28 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 28 Feb 2006 16:15:28 -0500
Subject: [Linux-cluster] Resource Info via clustat
In-Reply-To: <6f06b93d0602281151h1107c76an7ecafc4ece4007ca@mail.gmail.com>
References: <6f06b93d0602281151h1107c76an7ecafc4ece4007ca@mail.gmail.com>
Message-ID: <1141161328.13130.129.camel@ayanami.boston.redhat.com>

On Tue, 2006-02-28 at 14:51 -0500, Jerome Fernandes wrote:
> Hi,
>  
> On RH4, "clustat -x" will give me the Service/Group info, but not the
> "Resource" info that is defined in each Group. Is there any other
> method to retrieve this information besides just viewing
> the /etc/cluster/cluster.conf file? Any help would be appreciated.
>  

Not at the moment.  If any part of the service fails (e.g. any resource)
in such a way that it can't easily be recovered, the whole service is
restarted.

-- Lon





From jalmeter_99 at yahoo.com  Mon Feb 27 22:00:12 2006
From: jalmeter_99 at yahoo.com (jalmeter_99 at yahoo.com)
Date: 27 Feb 2006 17:00:12 -0500
Subject: [Linux-cluster] where to find user documentation for DLM?
Message-ID: <uslq4sbrn.fsf@yahoo.com>


Hello,

I have googled everything I can think of, but haven't found any
documentation for using DLM as a developer.  Would someone please
point me at a tutorial of some kind?

Background:

My employer has recently (last week) set up a GFS cluster for
evaluation.  I am trying to set up a locking test that mimics a system
that we use on VMS.

Thanks,
-Jason

-- 
Jason L. Almeter
jalmeter underscore 99 yahoo



From afletdinov at dc.baikal.ru  Fri Feb 17 06:29:47 2006
From: afletdinov at dc.baikal.ru (Afletdinov A.R.)
Date: Fri, 17 Feb 2006 06:29:47 -0000
Subject: [Linux-cluster] GFS with default ACL on new folder is not applied ?!
Message-ID: <43F56D50.6020203@dc.baikal.ru>

Hi All!

I use version:
kernel-2.6.9-22.0.2.ELsmp
GFS-kernel-smp-2.6.9-45.0.2
GFS-6.1.3-0
 
I use GFS with ACL:
# mount | grep gfs
/dev/sda5 on /home type gfs (rw,noatime,acl)
 
/dev/sda it hardware ISCSI QLogiq 4010C.
 
Examples error:
 
# cd /home
# mkdir test_acl
 
If now to apply default ACL on folder we receive a errors.
 
# setfacl -d -m g:dc_local:rwx test_acl/
setfacl: test_acl: Operation not permitted
 
 
After that the same without errors!?
# setfacl -d -m g:dc_local:rwx test_acl/
# getfacl test_acl
 
# file: test_acl
# owner: root
# group: root
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:dc_local:rwx
default:mask::rwx
default:other::r-x


I have checked up as and on a local SCSI disk, effect same.



From joshua at dev.imr-net.com  Fri Feb 17 23:48:36 2006
From: joshua at dev.imr-net.com (Joshua Schmidlkofer)
Date: Fri, 17 Feb 2006 23:48:36 -0000
Subject: [Linux-cluster] Multi-Node Clustering?
Message-ID: <1140220112.27132.20.camel@strapped.imr-net.com>

I am a little confused after reading various docs.  Is it possible to
use EVMS and Clustering with more than two nodes?  I have a small fiber
channel chassis, and I want to hook up three hosts.  I am planning to
use GFS for a Zope ZEO database.  Two FSs would be read-only and one
r/w.  However some of the evms docs have suggested that weare limited to
two host cluster.   


Thanks for any guidance.


Sincerely,
  Joshua
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060217/8cbdd961/attachment.sig>

From dym3fyehe at aol.com  Sat Feb 18 07:55:04 2006
From: dym3fyehe at aol.com (dym3fyehe at aol.com)
Date: Sat, 18 Feb 2006 07:55:04 -0000
Subject: [Linux-cluster] Service try to start before disk mount but failed
Message-ID: <8C80277BB22D36D-10F8-20476@mblk-r26.sysops.aol.com>

I am using RHCS V4 for Oracle database and another application running on the two node failover cluster system. The application has different services and installed in the shared storage. The cluster manager starts the services before the shared drives mounted to the second node. I use script to start and stop the service with time delay but still the service start before the shared storage moved to the second node and failed. Is any body knows how I can control the storage resource and the service? I want to make the service to wait until the shared storage available in the second node.
 
thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060218/1f7a3830/attachment.htm>

From suvankar_moitra at pcstech.com  Sat Feb 18 10:34:50 2006
From: suvankar_moitra at pcstech.com (Suvankar Moitra)
Date: Sat, 18 Feb 2006 10:34:50 -0000
Subject: [Linux-cluster] Rhcs-4 and oracle 10g 
Message-ID: <000601c55228$cc3c71d0$c996a8c0@tsg>

dear sir ,

Pl send a step step by guide for the above installation:-

1> RHEL -4
2> rhcs-4
3> oracle 10g 

i want to install active/passive cluster on rhcs-4 .How can i install oracle 10g in the cluster.I want failover .

Pl help me..........


regards

Suvankar Moitra
+91-033-9830152623
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060218/b6287a0c/attachment.htm>

From hirantha at vcs.informatics.lk  Mon Feb 27 03:53:26 2006
From: hirantha at vcs.informatics.lk (Hirantha Wijayawardena)
Date: Mon, 27 Feb 2006 09:53:26 +0600 (LKT)
Subject: [Linux-cluster] Network failure results cluster environment
	unstable & fragile
In-Reply-To: <001401c639cd$0170b3c0$6e00a8c0@PROGRESSION>
Message-ID: <2353676.6811141012406600.JavaMail.root@ux-mail>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060227/ad7422ce/attachment.htm>

From etuser2005 at yahoo.com  Tue Feb 28 14:23:16 2006
From: etuser2005 at yahoo.com (Simone Nervi)
Date: Tue, 28 Feb 2006 06:23:16 -0800 (PST)
Subject: [Linux-cluster] patch for cluster
Message-ID: <20060228142316.24021.qmail@web35514.mail.mud.yahoo.com>

Hi,

Can I send my patch for fence agents? Which is the
procedure?

Thanks for help.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From deval.kulshrestha at progression.com  Thu Feb 23 11:42:40 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Thu, 23 Feb 2006 17:12:40 +0530
Subject: [Linux-cluster] Network failure make cluster Unstable
Message-ID: <00a501c6386e$43b8cb60$ad00a8c0@PROGRESSION>

Hi 

I struggling to get some help on following configuration, even logged a call
with redhat support but didn' t get any satisfactory reply.

 

 

My HA Cluster Setup details

 

1.	HP DL 360 G4p Server                       2nos.
2.	HP MSA 500 G2 (SAN)                     1nos.
3.	RedHat Enterprise Linux 4 ES  
4.	Red Hat Cluster Suite 4

 

Server does have a HP SCSI HBA. MSA 500G2 is a scsi based SAN. Both of these
server are connected to SAN using SCSI VHDCI cable. I used a network switch
to establish network connectivity for the server. created a disk array of
three HDD on SAN with two logical volumes than  I have installed RHEL 4
Update 1 on both server(Servers are configured with RAID 1) than installed
all HP drivers and management agents. After server configuration and OS
installation I have installed Red Hat Cluster Suite v 4 on both the machine.

 

Than I have configured Cluster using Cluster Configuration Manager. Added
member hosts, configured fence device and assigned to member host(HP iLO is
certified as an fence device), Configured Failover domain with node
priority, configured resources such as floating IP address, File System,
Script, than configured service which need to be run in HA mode.

 

After configuring this I have tested with various scenario HA is working
properly, when ever powered off any machine , services fail over on
available node. 

 

Problem:

 

If network goes off on node1, and service which were not running on node1
are started by node1 with shared storage mount point, which was already
running on node 2 but both of these nodes are not able to communicate to
each other node2 anyway already running the same service with shared storage
mount point. Because of Fencing both of these nodes try to kill each other.
If network comes back nodes don't synchronize with each other .Hence this
entire setup become unstable.

 

 

 

 

With Regard

Deval

Progression Infonet Pvt. Ltd. 
55, Independent Electronic Modules, 
Sector - 18, Electronic City, 
Gurgaon - 122015

India
Tel          : - 0124 - 2455070, Ext. 215, Fax: 91-124-2398647
Mobile   : - 98186 -82509 
URL        : - www.progression.com 

 

===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060223/33eb307e/attachment.htm>