From yazan at ccs.com.jo  Sun Jan  2 18:02:58 2005
From: yazan at ccs.com.jo (Yazan  Bakheit)
Date: Sun, 2 Jan 2005 10:02:58 -0800
Subject: [Linux-cluster] quorum problem
Message-ID: <EXMAILrblZz02H67yRL00000121@Exmail.ccs.com.jo>

 
hi,
 
     i installed the cluster suite and then i found a problem in the shared
as i cant see it,and then i used the gfs and configured it and solve the
problem and after that i want to use the cluster suite gui but here in the
gui there is a check box called (Has Quorum) but i cant checked it it seemes
to be hidden, how can it be activated. i want to tell you that i used the
documentation for the gfs which is (rh-gfsico-en-6.0) and i perform every
thing but as two nodes, every thing is OK, but the original cluster suit
seems to be not working, what should i do now? , i will send or write you
the whole configuration that i used or creatre from the beginning if you
want.
 
  Please Execuse me if im nagging you with these cases or these questions, i
know that my questions may appeare as a stupid question but i am new in the
field and i really need a help.
 
          Tahnk You
 
 
    Regards 
 
       Yazan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050102/454589b5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tech.gif
Type: image/gif
Size: 862 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050102/454589b5/attachment.gif>

From rstevens at vitalstream.com  Mon Jan  3 17:21:42 2005
From: rstevens at vitalstream.com (Rick Stevens)
Date: Mon, 03 Jan 2005 09:21:42 -0800
Subject: [Linux-cluster] GFS and Storage greater than 2 TB
In-Reply-To: <41D4F81F.6000907@andrew.cmu.edu>
References: <75E9203E0F0DD511B37E00D0B789D45007E64F44@fcv-stgo.cverde.cl>
	<41D4F81F.6000907@andrew.cmu.edu>
Message-ID: <41D97F26.6050205@vitalstream.com>

Jacob Joseph wrote:
> Does this limit still exist with the cvs GFS on a 2.6 kernel?

I believe the limit under a 2.6 kernel is 16TB, but I've not checked it.

> 
> -Jacob
> 
> Markus Miller wrote:
> 
>> Thank you for the answer. That is all I needed to know.
>>
>> -----Mensaje original-----
>> De: Rick Stevens [mailto:rstevens at vitalstream.com]
>> Enviado el: Thursday, December 30, 2004 5:03 PM
>> Para: linux clistering
>> Asunto: Re: [Linux-cluster] GFS and Storage greater than 2 TB
>>
>>
>> Markus Miller wrote:
>>
>>> Hi,
>>>
>>> researching I found a posting to this list made by Kevin Anderson 
>>> (Date: Tue, 19 Oct 2004 17:56:24 -0500) where he states the following:
>>>
>>> ---snip---
>>> Maximum size of each GFS filesystem for RHEL3 (2.4.x kernel) is 2 TB,
>>> you can have multiple filesystems of that level.  So, to get access to
>>> 10TB of data requires a minimum of 5 separate filesystems/storage
>>> combinations.
>>> ---snip---
>>>
>>> What do I have to do to achive this? Do I have to configure several 
>>> GFS clusters in the cluster.ccs file (each of a m?ximum size of 2 
>>> TB)? Or do I have to configure one GFS cluster with serveral 
>>> filesystems each with a maximum size of 2 TB? The GFS Admin Guide is 
>>> not very precise, but what's really confusing me is the statement on 
>>> page 12: "2 TB maximum, for total of all storage connected to a GFS 
>>> cluster."
>>>
>>> At the moment we are evaluating to buy servers and storage, therefore 
>>> I do not have any equipment to do the testing myself.
>>> Any coment is highly apreciated.
>>
>>
>>
>> It's the GFS filesystem that has the limit (actually, it's the 2.4
>> kernel).  Essentially, "gfs_mkfs" can only handle a maximum of 2TB.
>>
>> What he means above is that you have to have five separate partitions
>> of 2TB each and each with a GFS filesystem on them.  You have to mount
>> those five filesystems separately.  If you're using VG/LVM, with a VG
>> as "vggroup" and LVs in that group as "test1" through "test5":
>>
>>     mount -t gfs /dev/mapper/vggroup-test1 /mnt/gfs1
>>     mount -t gfs /dev/mapper/vggroup-test2 /mnt/gfs2
>>     mount -t gfs /dev/mapper/vggroup-test3 /mnt/gfs3
>>     mount -t gfs /dev/mapper/vggroup-test4 /mnt/gfs4
>>     mount -t gfs /dev/mapper/vggroup-test5 /mnt/gfs5
>>
>> How you use them after that is up to you.  Just remember that a given
>> GFS filesystem under kernel 2.4 is limited to 2TB maximum
>> ----------------------------------------------------------------------
>> - Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
>> - VitalStream, Inc.                       http://www.vitalstream.com -
>> -                                                                    -
>> -        Brain:  The organ with which we think that we think.        -
>> ----------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> http://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-    Admitting you have a problem is the first step toward getting   -
-    medicated for it.      -- Jim Evarts (http://www.TopFive.com)   -
----------------------------------------------------------------------


From yazan at ccs.com.jo  Tue Jan  4 17:54:13 2005
From: yazan at ccs.com.jo (Yazan  Bakheit)
Date: Tue, 4 Jan 2005 09:54:13 -0800
Subject: [Linux-cluster] quorum problem
Message-ID: <EXMAILY6SzDqtxXD6U3000001be@Exmail.ccs.com.jo>

 
hi,
 
 
         how can i make the checkbox for the quorum in the gui utility
active,
 
        i mean that when i request the gui for the cluster suit there is a
check box called 
        (Has Quorum) and it is look to be hidden and i can't checked it even
i have made
        two partitions for the quorom and i add them to the cluster and to
the /etc/sysconfig/rawdevices
        but i cant checked it .
 
 
        how can i solve this?
 
        Thanks
 
       Yazan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050104/d8b1e52b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tech.gif
Type: image/gif
Size: 862 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050104/d8b1e52b/attachment.gif>

From pcaulfie at redhat.com  Tue Jan  4 09:16:06 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 4 Jan 2005 09:16:06 +0000
Subject: [Linux-cluster] sock_alloc 2.6.10 && gfs
In-Reply-To: <22514.1104496178@www4.gmx.net>
References: <22514.1104496178@www4.gmx.net>
Message-ID: <20050104091605.GA23831@tykepenguin.com>

On Fri, Dec 31, 2004 at 01:29:38PM +0100, Svetoslav Slavtchev wrote:
> Hi guys,
> it seems sock_alloc became static in 2.6.10
> and i'm not sure how exactly to fix gfs
> (
> cluster/dlm/lowcomms.c:454
>         memset(&peeraddr, 0, sizeof(peeraddr));
>         newsock = sock_alloc();
>         if (!newsock)
>                 return -ENOMEM;
> 
> )
> 
> do you think it'll be enough just to revert the change ?
> (see attached diff )

As a quick hack that should work.

What /should/ be done is to change lowcomms to use sock_create_kern()

-- 

patrick


From pcaulfie at redhat.com  Tue Jan  4 11:29:24 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 4 Jan 2005 11:29:24 +0000
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050104112924.GB23831@tykepenguin.com>

On Wed, Dec 22, 2004 at 09:33:39AM -0800, Daniel McNeil wrote:
> > > 
> > > How does one know what the current "generation" number is?
> > 
> > You don't, cman does. it's the current "generation" of the cluster which is
> > incremented for each state transition. Are you taking nodes up and down during
> > these tests??
> 
> The nodes are staying up.  I am mounting and umounting a lot.
> Any reason to not add generation /proc/cluster/status?  (it would help
> debugging at least).

No reason at all not to, apart from I really don't think it will tell anyone
anything useful. The cause of the problem is that the CMAN heartbeat messages
are being lost on the network flooded by lock traffic. generation mismatches are
just a symptom of that.
 
> 
> I currently have it set up for manual fencing and I have yet to see that
> work correctly.  This was a 3 node cluster.  cl032 got the bad
> generation number and cman was "killed by STARTTRANS or NOMINATE"
> cl030 got a bad generation number (but stayed up) and cl031 leaves
> the cluster because it says cl030 told it to.  So that leaves me
> with 1 node up without quorum.  I did not see any fencing messages.
> 
> Should the surviving node (cl030) have attempted fencing or does
> it only do that if it has quorum?

ah no, fencing will only happen if the cluster has quorum.
 
> I do not seem to be able to keep cman up for much past 2 days if 
> I have my tests running.  (it stays up with no load, of course).
> My tests are not the complicated currently either.  Just tar, du
> and rm in separate directories from 1, 2 and then 3 nodes
> simultaneously.  Who knows what will happen if I add tests
> to cause lots of dlm lock conflict.
> How long does cman stay up in your testing?

I've never had iSCSI stay up long enough to find out :(

-- 

patrick


From mbrookov at mines.edu  Tue Jan  4 14:44:59 2005
From: mbrookov at mines.edu (Matthew B. Brookover)
Date: Tue, 04 Jan 2005 07:44:59 -0700
Subject: [Linux-cluster] ISCSI? (was cman bad generation number)
In-Reply-To: <20050104112924.GB23831@tykepenguin.com>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
Message-ID: <1104849899.4815.13.camel@merlin.Mines.EDU>

On Tue, 2005-01-04 at 04:29, Patrick Caulfield wrote:


> I've never had iSCSI stay up long enough to find out :(


Which iSCSI are you using?

We are considering buying iSCSI based hardware from Left Hand Networks. 
I have not done any heavy testing, but I have used UNH-ISCSI for both
the target and initiator with GFS and did not have any problems.  Should
I re-think this plan?

Matt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050104/d2788801/attachment.htm>

From pcaulfie at redhat.com  Tue Jan  4 16:14:27 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 4 Jan 2005 16:14:27 +0000
Subject: [Linux-cluster] ISCSI? (was cman bad generation number)
In-Reply-To: <1104849899.4815.13.camel@merlin.Mines.EDU>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104849899.4815.13.camel@merlin.Mines.EDU>
Message-ID: <20050104161427.GC7994@tykepenguin.com>

On Tue, Jan 04, 2005 at 07:44:59AM -0700, Matthew B. Brookover wrote:
>    On Tue, 2005-01-04 at 04:29, Patrick Caulfield wrote:
> 
>  I've never had iSCSI stay up long enough to find out :(
> 
>    Which iSCSI are you using?
> 
>    We are considering buying iSCSI based hardware from Left Hand Networks.  I
>    have not done any heavy testing, but I have used UNH-ISCSI for both the
>    target and initiator with GFS and did not have any problems.  Should I
>    re-think this plan?

It might be my environment, others haven't reported problems. But my
linux-iscsi-4.0.1.10 on kernel 2.6.9 just locks up on a regular basis -
regardless of whether there is I/O to the device or not.

-- 

patrick


From pcaulfie at redhat.com  Tue Jan  4 16:42:17 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 4 Jan 2005 16:42:17 +0000
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050104164217.GE7994@tykepenguin.com>

Just to check that you are seeing what I think you're seeing, can you set some
cman variables to increase the heartbeat frequency:

echo "9" > /proc/cluster/config/cman/max_retries
echo "1" > /proc/cluster/config/cman/hello_timer

You'll need to do this before "cman_tool join".

Thanks,

patrick


From crh at ubiqx.mn.org  Tue Jan  4 21:19:58 2005
From: crh at ubiqx.mn.org (Christopher R. Hertel)
Date: Tue, 4 Jan 2005 15:19:58 -0600
Subject: [Linux-cluster] ISCSI?
In-Reply-To: <20050104161427.GC7994@tykepenguin.com>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104849899.4815.13.camel@merlin.Mines.EDU>
	<20050104161427.GC7994@tykepenguin.com>
Message-ID: <20050104211958.GP31004@Favog.ubiqx.mn.org>

On Tue, Jan 04, 2005 at 04:14:27PM +0000, Patrick Caulfield wrote:
> On Tue, Jan 04, 2005 at 07:44:59AM -0700, Matthew B. Brookover wrote:
> >    On Tue, 2005-01-04 at 04:29, Patrick Caulfield wrote:
> > 
> >  I've never had iSCSI stay up long enough to find out :(

This is off-topic, but...

Is the Ardis target the only iSCSI target source available for Linux?

Chris -)-----

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org


From daniel at osdl.org  Tue Jan  4 22:22:26 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Tue, 04 Jan 2005 14:22:26 -0800
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <20050104164217.GE7994@tykepenguin.com>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104164217.GE7994@tykepenguin.com>
Message-ID: <1104877346.2838.17.camel@ibm-c.pdx.osdl.net>

On Tue, 2005-01-04 at 08:42, Patrick Caulfield wrote:
> Just to check that you are seeing what I think you're seeing, can you set some
> cman variables to increase the heartbeat frequency:
> 
> echo "9" > /proc/cluster/config/cman/max_retries
> echo "1" > /proc/cluster/config/cman/hello_timer
> 
> You'll need to do this before "cman_tool join".
> 
> Thanks,
> 
> patrick
> 

I'll give this a try and let it run overnight.

Daniel


From mbrookov at mines.edu  Tue Jan  4 22:23:10 2005
From: mbrookov at mines.edu (Matthew B. Brookover)
Date: Tue, 04 Jan 2005 15:23:10 -0700
Subject: [Linux-cluster] ISCSI?
In-Reply-To: <20050104211958.GP31004@Favog.ubiqx.mn.org>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104849899.4815.13.camel@merlin.Mines.EDU>
	<20050104161427.GC7994@tykepenguin.com>
	<20050104211958.GP31004@Favog.ubiqx.mn.org>
Message-ID: <1104877390.4815.35.camel@merlin.Mines.EDU>

The unh-iscsi implements both target and initiator, see
http://unh-iscsi.sourceforge.net/ for more information.  I have used it
with GFS and Linux 2.6.8.1.  I could not get unh-iscsi to compile with
linux 2.6.9 and switched to hardware scsi for more recent testing.  I
believe there is a fix for 2.6.9.

Matt

On Tue, 2005-01-04 at 14:19, Christopher R. Hertel wrote:

> On Tue, Jan 04, 2005 at 04:14:27PM +0000, Patrick Caulfield wrote:
> > On Tue, Jan 04, 2005 at 07:44:59AM -0700, Matthew B. Brookover wrote:
> > >    On Tue, 2005-01-04 at 04:29, Patrick Caulfield wrote:
> > > 
> > >  I've never had iSCSI stay up long enough to find out :(
> 
> This is off-topic, but...
> 
> Is the Ardis target the only iSCSI target source available for Linux?
> 
> Chris -)-----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050104/876a0cb8/attachment.htm>

From crh at ubiqx.mn.org  Tue Jan  4 22:44:32 2005
From: crh at ubiqx.mn.org (Christopher R. Hertel)
Date: Tue, 4 Jan 2005 16:44:32 -0600
Subject: [Linux-cluster] ISCSI?
In-Reply-To: <1104877390.4815.35.camel@merlin.Mines.EDU>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104849899.4815.13.camel@merlin.Mines.EDU>
	<20050104161427.GC7994@tykepenguin.com>
	<20050104211958.GP31004@Favog.ubiqx.mn.org>
	<1104877390.4815.35.camel@merlin.Mines.EDU>
Message-ID: <20050104224432.GS31004@Favog.ubiqx.mn.org>

On Tue, Jan 04, 2005 at 03:23:10PM -0700, Matthew B. Brookover wrote:
> The unh-iscsi implements both target and initiator, see
> http://unh-iscsi.sourceforge.net/ for more information.  I have used it
> with GFS and Linux 2.6.8.1.  I could not get unh-iscsi to compile with
> linux 2.6.9 and switched to hardware scsi for more recent testing.  I
> believe there is a fix for 2.6.9.
> 
> Matt

Thanks!

I should have remembered that UNH was working on iSCSI.

Chris -)-----

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org


From daniel at osdl.org  Tue Jan  4 22:46:17 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Tue, 04 Jan 2005 14:46:17 -0800
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <20050104112924.GB23831@tykepenguin.com>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
Message-ID: <1104878776.2838.42.camel@ibm-c.pdx.osdl.net>

On Tue, 2005-01-04 at 03:29, Patrick Caulfield wrote:
> On Wed, Dec 22, 2004 at 09:33:39AM -0800, Daniel McNeil wrote:
> > > > 
> > > > How does one know what the current "generation" number is?
> > > 
> > > You don't, cman does. it's the current "generation" of the cluster which is
> > > incremented for each state transition. Are you taking nodes up and down during
> > > these tests??
> > 
> > The nodes are staying up.  I am mounting and umounting a lot.
> > Any reason to not add generation /proc/cluster/status?  (it would help
> > debugging at least).
> 
> No reason at all not to, apart from I really don't think it will tell anyone
> anything useful. The cause of the problem is that the CMAN heartbeat messages
> are being lost on the network flooded by lock traffic. generation mismatches are
> just a symptom of that.
>  

One thing I do not understand is that I am leaving the nodes in the
cluster and just doing mounting and umounting, so the generation number
should not be changing.

I think you are saying the the lock traffic is so high that the heart
are lost so the node being kicked out is seeing the new heart beat
from the other nodes and doesn't know they are not receiving his
heartbeat messages.  This node must be seeing the other nodes
heartbeat messages or it would have started a membership transition
without the other nodes.  Do I have this right?

Shouldn't the heartbeat messages have higher priority
over the lock traffic messages? 

Shouldn't there be a way of throttling back the lock traffic and seeing
if heartbeat connection can be re-established before starting a
membership transition?

Daniel


From daniel at osdl.org  Wed Jan  5 01:13:02 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Tue, 04 Jan 2005 17:13:02 -0800
Subject: [Linux-cluster] dlm patch to fix referencing free memory
Message-ID: <1104887581.7044.4.camel@ibm-c.pdx.osdl.net>

I checked out the latest cvs and noticed my patch to fix
the referencing of freed memory is not included.

Here is the patch again.  Please let me know how to get this
patch into the cvs tree.

Thanks,

Daniel

Looking through the code, I found when that a call to
queue_ast(lkb, AST_COMP | AST_DEL, 0); will lead to
process_asts() which will free the dlm_rsb.  So there
is a race where the rsb can be freed BEFORE we do the
up_write(rsb->res_lock);

The fix is simple, do the up_write() before the queue_ast().

--- cluster.orig/dlm-kernel/src/locking.c	2004-12-09 15:23:13.789834384 -0800
+++ cluster/dlm-kernel/src/locking.c	2004-12-09 15:24:51.809742940 -0800
@@ -687,8 +687,13 @@ void dlm_lock_stage3(struct dlm_lkb *lkb
 		lkb->lkb_retstatus = -EAGAIN;
 		if (lkb->lkb_lockqueue_flags & DLM_LKF_NOQUEUEBAST)
 			send_blocking_asts_all(rsb, lkb);
+		/*
+		 * up the res_lock before queueing ast, since the AST_DEL will
+		 * cause the rsb to be released and that can happen anytime.
+		 */
+		up_write(&rsb->res_lock);
 		queue_ast(lkb, AST_COMP | AST_DEL, 0);
-		goto out;
+		return;
 	}
 
 	/*
@@ -888,7 +893,13 @@ int dlm_unlock_stage2(struct dlm_lkb *lk
 	lkb->lkb_retstatus = flags & DLM_LKF_CANCEL ? -DLM_ECANCEL:-DLM_EUNLOCK;
 
 	if (!remote) {
+		/*
+		 * up the res_lock before queueing ast, since the AST_DEL will
+		 * cause the rsb to be released and that can happen anytime.
+		 */
+		up_write(&rsb->res_lock);
 		queue_ast(lkb, AST_COMP | AST_DEL, 0);
+		goto out2;
 	} else {
 		up_write(&rsb->res_lock);
 		release_lkb(rsb->res_ls, lkb);


From teigland at redhat.com  Wed Jan  5 03:08:34 2005
From: teigland at redhat.com (David Teigland)
Date: Wed, 5 Jan 2005 11:08:34 +0800
Subject: [Linux-cluster] dlm patch to fix referencing free memory
In-Reply-To: <1104887581.7044.4.camel@ibm-c.pdx.osdl.net>
References: <1104887581.7044.4.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050105030834.GB5770@redhat.com>

On Tue, Jan 04, 2005 at 05:13:02PM -0800, Daniel McNeil wrote:
> I checked out the latest cvs and noticed my patch to fix
> the referencing of freed memory is not included.
> 
> Here is the patch again.  Please let me know how to get this
> patch into the cvs tree.

Sorry, got it now.  Thanks for the fix.

-- 
Dave Teigland  <teigland at redhat.com>


From pcaulfie at redhat.com  Wed Jan  5 09:00:44 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 5 Jan 2005 09:00:44 +0000
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <1104878776.2838.42.camel@ibm-c.pdx.osdl.net>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104878776.2838.42.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050105090043.GA3866@tykepenguin.com>

On Tue, Jan 04, 2005 at 02:46:17PM -0800, Daniel McNeil wrote:
> 
> One thing I do not understand is that I am leaving the nodes in the
> cluster and just doing mounting and umounting, so the generation number
> should not be changing.
> 
> I think you are saying the the lock traffic is so high that the heart
> are lost so the node being kicked out is seeing the new heart beat
> from the other nodes and doesn't know they are not receiving his
> heartbeat messages.  This node must be seeing the other nodes
> heartbeat messages or it would have started a membership transition
> without the other nodes.  Do I have this right?

Yes, I think. It's all a bit vague. If it wasn't I might have an answer by now
:-(
 
> Shouldn't the heartbeat messages have higher priority
> over the lock traffic messages? 

They do. That's why I am puzzled. I'm currently investigating if the heartbeat
thread is being starved of CPU time by either the DLM or GFS.
 
> Shouldn't there be a way of throttling back the lock traffic and seeing
> if heartbeat connection can be re-established before starting a
> membership transition?

DLM & CMAN are not that tightly coupled.

-- 

patrick


From hkubota at gmx.net  Wed Jan  5 13:49:38 2005
From: hkubota at gmx.net (Harald Kubota)
Date: Wed, 05 Jan 2005 22:49:38 +0900
Subject: [Linux-cluster] ISCSI?
In-Reply-To: <20050104224432.GS31004@Favog.ubiqx.mn.org>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>	<20041222090832.GB1260@tykepenguin.com>	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>	<20050104112924.GB23831@tykepenguin.com>	<1104849899.4815.13.camel@merlin.Mines.EDU>	<20050104161427.GC7994@tykepenguin.com>	<20050104211958.GP31004@Favog.ubiqx.mn.org>	<1104877390.4815.35.camel@merlin.Mines.EDU>
	<20050104224432.GS31004@Favog.ubiqx.mn.org>
Message-ID: <41DBF072.5060407@gmx.net>

There is one more iscsi target available for Linux: 
http://sourceforge.net/projects/iscsitarget/
I use it for testing (not clustered yet) and it works surprisingly well 
(if the network is stable).

Harald


From crh at ubiqx.mn.org  Wed Jan  5 18:57:15 2005
From: crh at ubiqx.mn.org (Christopher R. Hertel)
Date: Wed, 5 Jan 2005 12:57:15 -0600
Subject: [Linux-cluster] ISCSI?
In-Reply-To: <41DBF072.5060407@gmx.net>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104849899.4815.13.camel@merlin.Mines.EDU>
	<20050104161427.GC7994@tykepenguin.com>
	<20050104211958.GP31004@Favog.ubiqx.mn.org>
	<1104877390.4815.35.camel@merlin.Mines.EDU>
	<20050104224432.GS31004@Favog.ubiqx.mn.org>
	<41DBF072.5060407@gmx.net>
Message-ID: <20050105185715.GD8351@Favog.ubiqx.mn.org>

Looks like this one has had more recent development as well.

Thanks!

Chris -)-----

On Wed, Jan 05, 2005 at 10:49:38PM +0900, Harald Kubota wrote:
> There is one more iscsi target available for Linux: 
> http://sourceforge.net/projects/iscsitarget/
> I use it for testing (not clustered yet) and it works surprisingly well 
> (if the network is stable).
> 
> Harald
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org


From bastian at waldi.eu.org  Wed Jan  5 22:18:55 2005
From: bastian at waldi.eu.org (Bastian Blank)
Date: Wed, 5 Jan 2005 23:18:55 +0100
Subject: [Linux-cluster] ISCSI?
In-Reply-To: <1104877390.4815.35.camel@merlin.Mines.EDU>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104849899.4815.13.camel@merlin.Mines.EDU>
	<20050104161427.GC7994@tykepenguin.com>
	<20050104211958.GP31004@Favog.ubiqx.mn.org>
	<1104877390.4815.35.camel@merlin.Mines.EDU>
Message-ID: <20050105221855.GA27974@wavehammer.waldi.eu.org>

On Tue, Jan 04, 2005 at 03:23:10PM -0700, Matthew B. Brookover wrote:
> The unh-iscsi implements both target and initiator, see
> http://unh-iscsi.sourceforge.net/ for more information.  I have used it
> with GFS and Linux 2.6.8.1.  I could not get unh-iscsi to compile with
> linux 2.6.9 and switched to hardware scsi for more recent testing.  I

The initiator locks itself to death if used on smp systems.

Bastian

-- 
Ahead warp factor one, Mr. Sulu.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050105/8d9c5d38/attachment.sig>

From daniel at osdl.org  Wed Jan  5 22:19:01 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Wed, 05 Jan 2005 14:19:01 -0800
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <20050105090043.GA3866@tykepenguin.com>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104878776.2838.42.camel@ibm-c.pdx.osdl.net>
	<20050105090043.GA3866@tykepenguin.com>
Message-ID: <1104963541.14834.10.camel@ibm-c.pdx.osdl.net>

On Wed, 2005-01-05 at 01:00, Patrick Caulfield wrote:
> On Tue, Jan 04, 2005 at 02:46:17PM -0800, Daniel McNeil wrote:
> > 
> > One thing I do not understand is that I am leaving the nodes in the
> > cluster and just doing mounting and umounting, so the generation number
> > should not be changing.
> > 
> > I think you are saying the the lock traffic is so high that the heart
> > are lost so the node being kicked out is seeing the new heart beat
> > from the other nodes and doesn't know they are not receiving his
> > heartbeat messages.  This node must be seeing the other nodes
> > heartbeat messages or it would have started a membership transition
> > without the other nodes.  Do I have this right?
> 
> Yes, I think. It's all a bit vague. If it wasn't I might have an answer by now
> :-(
>  
> > Shouldn't the heartbeat messages have higher priority
> > over the lock traffic messages? 
> 
> They do. That's why I am puzzled. I'm currently investigating if the heartbeat
> thread is being starved of CPU time by either the DLM or GFS.
>  
> > Shouldn't there be a way of throttling back the lock traffic and seeing
> > if heartbeat connection can be re-established before starting a
> > membership transition?
> 
> DLM & CMAN are not that tightly coupled.

Do DLM and CMAN use a common communication layer?

I was expecting that they would since having multiple
interfaces for redundancy would be something they
would both want.  DLM should just want to be able
to send messages to other nodes and shouldn't care
how it gets there.  I was expecting this to be
part of CMAN since it should know which interfaces are
connected to which nodes and their state.  It could
also load balance on multiple networks.  Is there a
description of how multiple interfaces are handle today?

Thanks,

Daniel


From pcaulfie at redhat.com  Thu Jan  6 08:47:19 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 6 Jan 2005 08:47:19 +0000
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <1104963541.14834.10.camel@ibm-c.pdx.osdl.net>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104878776.2838.42.camel@ibm-c.pdx.osdl.net>
	<20050105090043.GA3866@tykepenguin.com>
	<1104963541.14834.10.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050106084719.GA4923@tykepenguin.com>

On Wed, Jan 05, 2005 at 02:19:01PM -0800, Daniel McNeil wrote:
> 
> Do DLM and CMAN use a common communication layer?

No. They should, but the communications in CMAN is primitive and not up
to supporting the high levels of traffic that the DLM can generate. CMAN uses
its own "reliable multicast" system whereas the DLM uses TCP. The disparity is
really only because CMAN needs to do cluster-wide broadcasts but the DLM only
ever needs to talk to single nodes at a time (per message).
 
> I was expecting that they would since having multiple
> interfaces for redundancy would be something they
> would both want.  DLM should just want to be able
> to send messages to other nodes and shouldn't care
> how it gets there.  I was expecting this to be
> part of CMAN since it should know which interfaces are
> connected to which nodes and their state.  It could
> also load balance on multiple networks.  Is there a
> description of how multiple interfaces are handle today?

The short answer is "rather badly". CMAN handles dual interfaces by a simple
failover if messages go missing. DLM gets the interface information from CMAN
but because of the nature of TCP the failover isn't nearly as clean
 

-- 

patrick


From erwan at seanodes.com  Thu Jan  6 10:03:30 2005
From: erwan at seanodes.com (Velu Erwan)
Date: Thu, 06 Jan 2005 11:03:30 +0100
Subject: [Linux-cluster] Test suite & benchmark
Message-ID: <1105005810.5455.71.camel@R1.seanodes.com>

Hi all,
I must assume this is a common request but I didn't find any clue about
it.

I'd like to know which test suite and/or benchmark tools you are using
to test and validate your gfs installation.

I mean validating the installation using a tool for stressing the lock
manager, testing concurrency open and/or write etc...

I saw some of you using bonnie which doesn't seems to do that, do you
know any other tool I could use ?

Thanks,


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Ceci est une partie de message num?riquement sign?e
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050106/3608e89f/attachment.sig>

From Hansjoerg.Maurer at dlr.de  Thu Jan  6 12:16:19 2005
From: Hansjoerg.Maurer at dlr.de (Hansjoerg.Maurer at dlr.de)
Date: Thu, 6 Jan 2005 13:16:19 +0100
Subject: [Linux-cluster] Some experiences and questions concerning GFS vs.
	GPFS 
Message-ID: <4CE5177FBED2784FAC715DB5553BD8970A3F28@exbe04.intra.dlr.de>

Hi

we are planing to implement a Linux Cluster Solution with shared SAN storage in Q2/2005.

We already tried RedHat GFS in a old test SAN environment, and it works great.

As an alternative solution we found a product called GPFS from IBM (gerneral parallel file system)

It seems to have some features, GFS does not have now, but according to the documentation, 
it seems to bee very complex and it seems to support only IBM Storage devices (FastT....).

The advantages seem to be
- filesystems up to 100 TB on IA32 (Blocksize up to 1MByte)
- syncronous replication of pools
- better scaling 
  If you have one RAID5 array in the SAN (lets call it RAID-A) and you add another RAID5 array
  (RAID-B) you can but them together, exceed the filesystem and reallocate the filesystem while it is online 
  to a stripeset over RAID-A AND RAID-B in order to get optimal performance.

The disadvantages seem to bee 
- the dependency on IBM Storage devices (especially for fencing)
- the complexity (fileaccess takes place over a userspace daemon, which caches data and stat information)
  This seems to be the reason they can achieve the file system size
- the integration of GFS into RHEL seems to be better of course... :-)


ok, now my questions 
What is the status of GFS for RHEL4 concerning the advantages of GPFS from above

- is it correct, that GFS filesystems in RHEL4 even on x86_64 can be very big (PByte) to? 
- will there be something similar like the reallocation of a stripe set over a newly created array in RHEL4 GFS?
- the possibility of syncronous mirroring is not so important in our special case...
- will the next stable version of GFS with the above features be available with initial Release of RHEL4 or 
  is there an other planed release date. We want to implement the SAN in Q2/2005, so that we can wait
  if some of the limitations of GFS will be negotiated until than.

And a final questions:
- has anybody experience with both products, so that he can tell me about advantages and disadvantages 
  (especially concerning performance). We will recieve a GPFS evaluation licence next week, but our
  old SAN storage hardware is not apropriate for performance tests, because it will be the bottleneck :-)

Sorry if this E-Mail is a bit off topic.
Sales persons are often showing you only the advantages of their product, 
and I hope that someone can help me with practical experiences.

If you think, that this is off topic, please answer directly.

Thank you very much

Greetings


Hansj?rg

  
We will  


_________________________________________________________________
Dr.  Hansjoerg Maurer           | LAN- & System-Manager
                                        |
Deutsches Zentrum               | DLR Oberpfaffenhofen
  f. Luft- und Raumfahrt e.V.   |
Institut f. Robotik             |
Postfach 1116                   | Muenchner Strasse 20
82230 Wessling                  | 82234 Wessling
Germany                         |
                                |
Tel: 08153/28-2431              | E-mail: Hansjoerg.Maurer at dlr.de
Fax: 08153/28-1134              | WWW: http://www.robotic.dlr.de/
__________________________________________________________________

There are 10 types of people in this world, 
those who understand binary and those who don't.


From tom at nethinks.com  Thu Jan  6 11:33:56 2005
From: tom at nethinks.com (tom at nethinks.com)
Date: Thu, 6 Jan 2005 12:33:56 +0100
Subject: [Linux-cluster] Again Problems with newest CVS
Message-ID: <OF928853AB.B8488414-ONC1256F81.003F6C8E-C1256F81.003F8906@nethinks.com>


Hi all,

getting the following errors any ideas?


make[3]: Entering directory `/usr/src/linux-2.6.9'
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/acl.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/bits.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/bmap.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/daemon.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/diaper.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/dio.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/dir.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/eaops.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/eattr.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/file.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/glock.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/glops.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/inode.o
  CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.o
In file included from /tmp/gfs/cluster/gfs-kernel/src/gfs/gfs.h:24,
                 from /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:24:
/tmp/gfs/cluster/gfs-kernel/src/gfs/incore.h:817: error: redefinition of
`struct gfs_args'
/tmp/gfs/cluster/gfs-kernel/src/gfs/incore.h:844: error: redefinition of
`struct gfs_tune'
In file included from /tmp/gfs/cluster/gfs-kernel/src/gfs/gfs.h:25,
                 from /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:24:
/tmp/gfs/cluster/gfs-kernel/src/gfs/util.h:321: error: redefinition of
`struct gfs_user_buffer'
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:44: warning: `struct gfs_ioctl'
declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:44: warning: its scope is only
this definition or declaration, which is probably not what you want
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:59: warning: `struct gfs_ioctl'
declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_skeleton':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:66: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:67: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:73: warning: passing arg 2 of
pointer to function from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:77: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:105: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_cookie':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:109: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:130: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_super':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:137: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:139: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:163: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:191: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_args':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:196: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:234: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_get_lockstruct':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:239: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:270: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_stat_gfs':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:275: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:335: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_counters':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:340: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:431: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_tune':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:436: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:505: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_set_tune':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:513: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:516: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:520: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:761: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_do_reclaim':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:768: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:798: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_do_shrink':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:802: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:817: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_get_file_stat':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:823: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:825: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:838: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:859: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_set_file_flag':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:868: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:871: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:882: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:967: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_get_file_meta':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:973: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:976: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:977: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1023: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_do_file_flush':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1025: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1040: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi2hip':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1044: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1047: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1072: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_get_hfile_stat':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1079: warning: passing arg 2 of
`gi2hip' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1083: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1096: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1117: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_do_hfile_read':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1126: warning: passing arg 2 of
`gi2hip' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1130: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1130: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1137: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1137: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1137: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1154: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_do_hfile_write':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1166: warning: passing arg 2 of
`gi2hip' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1170: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1170: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1173: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1186: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1186: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1223: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1223: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1223: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1258: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_do_hfile_trunc':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1267: warning: passing arg 2 of
`gi2hip' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1275: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1292: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_do_quota_sync':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1296: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1310: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_do_quota_refresh':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1318: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1321: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1354: warning: `struct
gfs_ioctl' declared inside parameter list
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
`gi_do_quota_read':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1362: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1364: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1367: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1392: error: dereferencing
pointer to incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gfs_ioctl_i':
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1410: error: storage size of
`gi' isn't known
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1415: error: invalid
application of `sizeof' to an incomplete type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1432: warning: passing arg 3 of
`gi_skeleton' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1436: warning: passing arg 3 of
`gi_skeleton' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1438: warning: passing arg 3 of
`gi_skeleton' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1440: warning: passing arg 3 of
`gi_skeleton' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1442: warning: passing arg 3 of
`gi_skeleton' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1444: warning: passing arg 3 of
`gi_skeleton' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1448: warning: passing arg 3 of
`gi_skeleton' from incompatible pointer type
/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1410: warning: unused variable
`gi'
make[4]: *** [/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.o] Error 1
make[3]: *** [_module_/tmp/gfs/cluster/gfs-kernel/src/gfs] Error 2

-tom


From lhh at redhat.com  Thu Jan  6 21:39:59 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 06 Jan 2005 16:39:59 -0500
Subject: [Linux-cluster] quorum problem
In-Reply-To: <EXMAILY6SzDqtxXD6U3000001be@Exmail.ccs.com.jo>
References: <EXMAILY6SzDqtxXD6U3000001be@Exmail.ccs.com.jo>
Message-ID: <1105047599.12800.1.camel@ayanami.boston.redhat.com>

On Tue, 2005-01-04 at 09:54 -0800, Yazan Bakheit wrote:
>         i mean that when i request the gui for the cluster suit there
> is a check box called 
>         (Has Quorum) and it is look to be hidden and i can't checked
> it even i have made


It's not an option; it's an indicator.  You can't change it.  The
cluster will change it when it has a majority of members online.

Try starting up both nodes.

-- Lon


From kpreslan at redhat.com  Fri Jan  7 06:44:39 2005
From: kpreslan at redhat.com (Ken Preslan)
Date: Fri, 7 Jan 2005 00:44:39 -0600
Subject: [Linux-cluster] Again Problems with newest CVS
In-Reply-To: <OF928853AB.B8488414-ONC1256F81.003F6C8E-C1256F81.003F8906@nethinks.com>
References: <OF928853AB.B8488414-ONC1256F81.003F6C8E-C1256F81.003F8906@nethinks.com>
Message-ID: <20050107064439.GA21295@potassium.msp.redhat.com>

You have an old version of gfs_ioctl.h somewhere.  Find it, replace it
with the new one, and try again.


On Thu, Jan 06, 2005 at 12:33:56PM +0100, tom at nethinks.com wrote:
> 
> 
> 
> 
> Hi all,
> 
> getting the following errors any ideas?
> 
> 
> make[3]: Entering directory `/usr/src/linux-2.6.9'
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/acl.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/bits.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/bmap.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/daemon.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/diaper.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/dio.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/dir.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/eaops.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/eattr.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/file.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/glock.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/glops.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/inode.o
>   CC [M]  /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.o
> In file included from /tmp/gfs/cluster/gfs-kernel/src/gfs/gfs.h:24,
>                  from /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:24:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/incore.h:817: error: redefinition of
> `struct gfs_args'
> /tmp/gfs/cluster/gfs-kernel/src/gfs/incore.h:844: error: redefinition of
> `struct gfs_tune'
> In file included from /tmp/gfs/cluster/gfs-kernel/src/gfs/gfs.h:25,
>                  from /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:24:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/util.h:321: error: redefinition of
> `struct gfs_user_buffer'
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:44: warning: `struct gfs_ioctl'
> declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:44: warning: its scope is only
> this definition or declaration, which is probably not what you want
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:59: warning: `struct gfs_ioctl'
> declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_skeleton':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:66: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:67: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:73: warning: passing arg 2 of
> pointer to function from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:77: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:105: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_cookie':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:109: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:130: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_super':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:137: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:139: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:163: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:191: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_args':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:196: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:234: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_get_lockstruct':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:239: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:270: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_stat_gfs':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:275: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:335: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_counters':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:340: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:431: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_get_tune':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:436: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:505: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_set_tune':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:513: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:516: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:520: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:761: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_do_reclaim':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:768: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:798: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi_do_shrink':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:802: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:817: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_get_file_stat':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:823: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:825: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:838: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:859: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_set_file_flag':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:868: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:871: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:882: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:967: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_get_file_meta':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:973: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:976: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:977: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1023: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_do_file_flush':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1025: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1040: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gi2hip':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1044: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1047: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1072: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_get_hfile_stat':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1079: warning: passing arg 2 of
> `gi2hip' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1083: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1096: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1117: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_do_hfile_read':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1126: warning: passing arg 2 of
> `gi2hip' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1130: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1130: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1137: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1137: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1137: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1154: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_do_hfile_write':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1166: warning: passing arg 2 of
> `gi2hip' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1170: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1170: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1173: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1186: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1186: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1223: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1223: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1223: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1258: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_do_hfile_trunc':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1267: warning: passing arg 2 of
> `gi2hip' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1275: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1292: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_do_quota_sync':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1296: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1310: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_do_quota_refresh':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1318: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1321: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: At top level:
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1354: warning: `struct
> gfs_ioctl' declared inside parameter list
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function
> `gi_do_quota_read':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1362: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1364: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1367: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1392: error: dereferencing
> pointer to incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c: In function `gfs_ioctl_i':
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1410: error: storage size of
> `gi' isn't known
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1415: error: invalid
> application of `sizeof' to an incomplete type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1432: warning: passing arg 3 of
> `gi_skeleton' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1436: warning: passing arg 3 of
> `gi_skeleton' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1438: warning: passing arg 3 of
> `gi_skeleton' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1440: warning: passing arg 3 of
> `gi_skeleton' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1442: warning: passing arg 3 of
> `gi_skeleton' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1444: warning: passing arg 3 of
> `gi_skeleton' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1448: warning: passing arg 3 of
> `gi_skeleton' from incompatible pointer type
> /tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.c:1410: warning: unused variable
> `gi'
> make[4]: *** [/tmp/gfs/cluster/gfs-kernel/src/gfs/ioctl.o] Error 1
> make[3]: *** [_module_/tmp/gfs/cluster/gfs-kernel/src/gfs] Error 2
> 
> -tom
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Ken Preslan <kpreslan at redhat.com>


From pcaulfie at redhat.com  Fri Jan  7 10:46:31 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 7 Jan 2005 10:46:31 +0000
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <20050105090043.GA3866@tykepenguin.com>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050104112924.GB23831@tykepenguin.com>
	<1104878776.2838.42.camel@ibm-c.pdx.osdl.net>
	<20050105090043.GA3866@tykepenguin.com>
Message-ID: <20050107104630.GB3614@tykepenguin.com>

OK, some more investigation seems to be pointing to the heartbeat thread being
not woken up when it's timer tells it to.

This might be simply that there are other higher-priority tasks happening on the
system because of the IO load.

Now I've upgraded my cluster to 2.6.10, iSCSI seems to be more stable (same
iSCSI software interestingly) so I've set the heartbeat nice level to -20 (same
as the iSCSI process) and I'll see if it survives the weekend. It's done
overnight so far which is better than I've had yet 8)

-- 

patrick


From ptr at poczta.fm  Fri Jan  7 13:45:10 2005
From: ptr at poczta.fm (ptr at poczta.fm)
Date: 07 Jan 2005 14:45:10 +0100
Subject: [Linux-cluster] Current CVS and 2.6.9
Message-ID: <20050107134510.D49002599CB@poczta.interia.pl>


   Hello.

I decided recently to upgrade GFS version due to mysterious hard nodes lockups. Unfortunatelly attempts to build GFS userland and kernel modules failed, with following errors:

/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c: In function `gfs_lock':
/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c:1448: warning: implicit declaration of function `posix_lock_file_wait'
/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c: In function `do_flock':
/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c:1529: warning: implicit declaration of function `flock_lock_file_wait'
/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c: At top level:
/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c:1622: error: unknown field `flock' specified in initializer
/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c:1622: warning: initialization from incompatible pointer type
/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c:1632: error: unknown field `flock' specified in initializer
/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.c:1632: warning: initialization from incompatible pointer type
make[5]: *** [/install/GFS/cluster/gfs-kernel/src/gfs/ops_file.o] Error 1
make[4]: *** [_module_/install/GFS/cluster/gfs-kernel/src/gfs] Error 2
make[4]: Leaving directory `/usr/src/linux-2.6.8.1'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/install/GFS/cluster/gfs-kernel/src/gfs'
make[2]: *** [install] Error 2
make[2]: Leaving directory `/install/GFS/cluster/gfs-kernel/src'
make[1]: *** [install] Error 2
make[1]: Leaving directory `/install/GFS/cluster/gfs-kernel'
make: *** [all] Error 2

I also received a bunch of warnings, like:
*** Warning: "kcl_addref_cluster" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_get_node_by_addr" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_get_node_addresses" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_releaseref_cluster" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_get_current_interface" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_get_node_by_nodeid" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_leave_service" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_remove_callback" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_global_service_id" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_unregister_service" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_join_service" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_start_done" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_add_callback" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!
*** Warning: "kcl_register_service" [/install/GFS/cluster/dlm-kernel/src/dlm.ko] undefined!

Both attempts failed - I tried to build new GFS from sources using freshly build (but not installed!) kerlne 2.6.8.9, the same concerns 2.6.8.1 unpacked form vanilla sources. No such think occured when I was building GFS the "old way" (patching kernel sources separately). I'm running currently CVS version 2.0.1 and sometimes without _any_ suspisious reason both nodes in cluster freeze - no chance to reboot them other way than by hard reset. No error logs to debug.
   TIA for your help, regards.

Piotr

----------------------------------------------------------------------
Startuj z INTERIA.PL!!! >>> http://link.interia.pl/f1837


From daniel at osdl.org  Tue Jan 11 00:50:20 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Mon, 10 Jan 2005 16:50:20 -0800
Subject: [Linux-cluster] mount hang during test runs
Message-ID: <1105404619.30484.7.camel@ibm-c.pdx.osdl.net>

I started another test run on last week and let it run
over the week end.  a 3 node test was running when it hung.

I set /proc/cluster/config/cman/max_retries to 9
and /proc/cluster/config/cman/hello_timer to 1

This time I hit a mount hang.  The mount is hung on cl032:

mount         D C170F414     0 18375  18369                     (NOTLB)
e2dbbc20 00000082 e1dbda10 c170f414 0003e36e 00000000 00000008 c011bb10
       d5ea8d58 57435700 0003e36e c18880ac e2dbbc00 e1dbda10 00000000 c170f8c0
       c170ef60 00000000 000038d3 57435987 0003e36e e1dbcf50 e1dbd0b8 00000000
Call Trace:
 [<c03dbac4>] wait_for_completion+0xa4/0xe0
 [<f8a92ed2>] kcl_join_service+0x162/0x1a0 [cman]
 [<f8966fbf>] init_mountgroup+0x6f/0xc0 [lock_dlm]
 [<f8969411>] lm_dlm_mount+0xa1/0xf0 [lock_dlm]
 [<f8812355>] lm_mount+0x155/0x250 [lock_harness]
 [<f8affa0d>] gfs_lm_mount+0x1fd/0x390 [gfs]
 [<f8b0ee53>] fill_super+0x513/0x1330 [gfs]
 [<f8b0fe49>] gfs_get_sb+0x199/0x210 [gfs]
 [<c0168e4c>] do_kern_mount+0x5c/0x110
 [<c0180138>] do_new_mount+0x98/0xe0
 [<c0180905>] do_mount+0x165/0x1b0
 [<c0180dd5>] sys_mount+0xb5/0x140
 [<c010537d>] sysenter_past_esp+0x52/0x71

Looks like a problem join the mount group.

/proc/cluster/services shows:

[root at cl030 cman]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]
 
DLM Lock Space:  "stripefs"                        324 693 run       -
[1 2 3]
 
GFS Mount Group: "stripefs"                        325 694 update    U-4,1,3
[1 2 3]
 
[root at cl031 cluster]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]

DLM Lock Space:  "stripefs"                        324 457 run       -
[1 2 3]
 
GFS Mount Group: "stripefs"                        325 458 update    U-4,1,3
[1 2 3]

[root at cl032 cluster]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]
 
DLM Lock Space:  "stripefs"                        324 225 run       -
[1 2 3]
 
GFS Mount Group: "stripefs"                        325 226 join      S-6,20,3
[1 2 3]
 

I collected stack traces and a bunch of other info.  It is
available here:
http://developer.osdl.org/daniel/GFS/mount.hang.05jan2005/

Any ideas on debugging this one?

Daniel


From pcaulfie at redhat.com  Tue Jan 11 08:56:08 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 11 Jan 2005 08:56:08 +0000
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050111085608.GA6645@tykepenguin.com>

On Wed, Dec 22, 2004 at 09:33:39AM -0800, Daniel McNeil wrote:
> How long does cman stay up in your testing?

With the higher pririty on the heartbeat thread I got 5 days before iSCSI died
on me again... This isn't quite the same load as yours but it is on 8 busy nodes.

-- 

patrick


From serge at triumvirat.ru  Tue Jan 11 08:34:24 2005
From: serge at triumvirat.ru (Sergey)
Date: Tue, 11 Jan 2005 11:34:24 +0300
Subject: [Linux-cluster] some questions about setting up GFS
Message-ID: <1125914338.20050111113424@triumvirat.ru>

Hello!

We bought HP ProLiant DL380 G4 Packaged Cluster-MSA500 G2 server and
after installation of RHEL3 and GFS-6.0.0-15 I have some questions.

Because I have no expirience in setting up such systems, please, tell
me, which mistakes in configuration I made.

Now system is configured this way:

/dev/cciss/c0d1 - External Logical Volume, 293.6 Gbytes (RAID 5)

===================
[root at hp1 root]# fdisk /dev/cciss/c0d1
Command (m for help): p

Disk /dev/cciss/c0d1: 293.6 GB, 293626045440 bytes
255 heads, 63 sectors/track, 35698 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

           Device Boot    Start       End    Blocks   Id  System
/dev/cciss/c0d1p1             1         9     72261   fd  Linux raid autodetect
/dev/cciss/c0d1p2            10     35698 286671892+  fd  Linux raid autodetect

I've found nothing in GFS documentation about partitioning hard drive
during setting up GFS. So, I used to experiment.

First partition is allocated for <- CCA device ->. I'd like to know if
there is enough space and right type of partition, and, at all, is it
right way to allocate <- CCA device ->.

Second partition was formatted as GFS, so the question is: is selected
type of partition right or not?

===================

[root at hp1 root]# cat pool0.cfg
poolname pool0
minor 0 subpools 2
subpool 0 0 1 gfs_journal
pooldevice 0 0 /dev/cciss/c0d1p1
subpool 1 0 1 gfs_data
pooldevice 1 0 /dev/cciss/c0d1p2


Actually, I don't know why, but first subpool I've made as gfs_journal :-)

Basically, the system works, but something may be wrong.

===================
During setting up I've made this command:

[root at hp1 root]# ccs_tool create /root/cluster/ /dev/cciss/c0d1p1

[root at hp1 root]# pool_tool -s
 Device                                            Pool Label
 ======                                            ==========
 /dev/cciss/c0d0                  <- partition information ->
 /dev/cciss/c0d0p1                    <- EXT2/3 filesystem ->
 /dev/cciss/c0d0p2                          <- swap device ->
 /dev/cciss/c0d0p3                    <- EXT2/3 filesystem ->
 /dev/cciss/c0d1                  <- partition information ->
 /dev/cciss/c0d1p1                           <- CCA device ->
 /dev/cciss/c0d1p2                       <- GFS filesystem ->

I'd like to hear some comments on it.

===================

Thanks.

--
Sergey Mikhnevich


From Vincent.Aniello at PipelineTrading.com  Tue Jan 11 13:45:52 2005
From: Vincent.Aniello at PipelineTrading.com (Vincent Aniello)
Date: Tue, 11 Jan 2005 08:45:52 -0500
Subject: [Linux-cluster] Multipath I/O
Message-ID: <834F55E6F1BE3B488AD3AFC927A0970018B873@EMAILSRV1.exad.net>

Do I need to use the QLogic failover driver with GFS for multipath I/O
or does GFS handle multipath I/O on its own?
 
Thanks for your input.
 
--Vincent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050111/d1b9c56a/attachment.htm>

From serge at triumvirat.ru  Tue Jan 11 08:34:24 2005
From: serge at triumvirat.ru (Sergey)
Date: Tue, 11 Jan 2005 11:34:24 +0300
Subject: [Linux-cluster] some questions about setting up GFS
Message-ID: <1125914338.20050111113424@triumvirat.ru>

Hello!

We bought HP ProLiant DL380 G4 Packaged Cluster-MSA500 G2 server and
after installation of RHEL3 and GFS-6.0.0-15 I have some questions.

Because I have no expirience in setting up such systems, please, tell
me, which mistakes in configuration I made.

Now system is configured this way:

/dev/cciss/c0d1 - External Logical Volume, 293.6 Gbytes (RAID 5)

===================
[root at hp1 root]# fdisk /dev/cciss/c0d1
Command (m for help): p

Disk /dev/cciss/c0d1: 293.6 GB, 293626045440 bytes
255 heads, 63 sectors/track, 35698 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

           Device Boot    Start       End    Blocks   Id  System
/dev/cciss/c0d1p1             1         9     72261   fd  Linux raid autodetect
/dev/cciss/c0d1p2            10     35698 286671892+  fd  Linux raid autodetect

I've found nothing in GFS documentation about partitioning hard drive
during setting up GFS. So, I used to experiment.

First partition is allocated for <- CCA device ->. I'd like to know if
there is enough space and right type of partition, and, at all, is it
right way to allocate <- CCA device ->.

Second partition was formatted as GFS, so the question is: is selected
type of partition right or not?

===================

[root at hp1 root]# cat pool0.cfg
poolname pool0
minor 0 subpools 2
subpool 0 0 1 gfs_journal
pooldevice 0 0 /dev/cciss/c0d1p1
subpool 1 0 1 gfs_data
pooldevice 1 0 /dev/cciss/c0d1p2


Actually, I don't know why, but first subpool I've made as gfs_journal :-)

Basically, the system works, but something may be wrong.

===================
During setting up I've made this command:

[root at hp1 root]# ccs_tool create /root/cluster/ /dev/cciss/c0d1p1

[root at hp1 root]# pool_tool -s
 Device                                            Pool Label
 ======                                            ==========
 /dev/cciss/c0d0                  <- partition information ->
 /dev/cciss/c0d0p1                    <- EXT2/3 filesystem ->
 /dev/cciss/c0d0p2                          <- swap device ->
 /dev/cciss/c0d0p3                    <- EXT2/3 filesystem ->
 /dev/cciss/c0d1                  <- partition information ->
 /dev/cciss/c0d1p1                           <- CCA device ->
 /dev/cciss/c0d1p2                       <- GFS filesystem ->

I'd like to hear some comments on it.

===================

Thanks.

--
Sergey Mikhnevich


From jbrassow at redhat.com  Tue Jan 11 17:30:26 2005
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Tue, 11 Jan 2005 11:30:26 -0600
Subject: [Linux-cluster] some questions about setting up GFS
In-Reply-To: <1125914338.20050111113424@triumvirat.ru>
References: <1125914338.20050111113424@triumvirat.ru>
Message-ID: <7B452B98-63F6-11D9-85A8-000A957BB1F6@redhat.com>

It looks like you are not using pool.

You seem to have divided up the storage sanely.  However, rather than 
forming a pool logical volume on the partitions you've created and then 
putting ccs and gfs on the pool volumes, you have simply put ccs and 
gfs directly on the underlying device.  This is not terrible if you can 
ensure that your devices will _always_ have the same name, regardless 
of the machine you are viewing them from.  (Pool's main function is to 
write labels to the underlying devices so that they can be uniquely 
identified on every machine in the cluster.)

So, at this point, you can choose to forget about using pool and 
proceed as you have started; or you can set up your pools first and put 
ccs and gfs on them (this is the method normally used).

If you choose to set up pools, you would do something like:
# create config files for two different pools (one for ccs and one for 
gfs)
prompt> cat > cca_pool.cfg
poolname cca
subpools 1
subpool 0 0 1 gfs_data
pooldevice 0 0 /dev/cciss/c0d1p1

prompt> cat > gfs1_pool.cfg
poolname gfs1
subpools 1
subpool 0 0 1 gfs_data
pooldevice 0 0 /dev/cciss/c0d1p2

#Write the labels to disk - remember this only needs to be done once
prompt> pool_tool cca_pool.cfg
prompt> pool_tool gfs1_pool.cfg

#Instantiate the pool logical volumes
prompt> pool_assemble

#Now you have block devices called /dev/pool/cca and /dev/pool/gfs1
# create your CCS archive and gfs file system on these devices
prompt> ccs_tool create /root/cluster /dev/pool/cca
prompt> mkfs.gfs ... /dev/pool/gfs1

  brassow

On Jan 11, 2005, at 2:34 AM, Sergey wrote:

> Hello!
>
> We bought HP ProLiant DL380 G4 Packaged Cluster-MSA500 G2 server and
> after installation of RHEL3 and GFS-6.0.0-15 I have some questions.
>
> Because I have no expirience in setting up such systems, please, tell
> me, which mistakes in configuration I made.
>
> Now system is configured this way:
>
> /dev/cciss/c0d1 - External Logical Volume, 293.6 Gbytes (RAID 5)
>
> ===================
> [root at hp1 root]# fdisk /dev/cciss/c0d1
> Command (m for help): p
>
> Disk /dev/cciss/c0d1: 293.6 GB, 293626045440 bytes
> 255 heads, 63 sectors/track, 35698 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
>            Device Boot    Start       End    Blocks   Id  System
> /dev/cciss/c0d1p1             1         9     72261   fd  Linux raid 
> autodetect
> /dev/cciss/c0d1p2            10     35698 286671892+  fd  Linux raid 
> autodetect
>
> I've found nothing in GFS documentation about partitioning hard drive
> during setting up GFS. So, I used to experiment.
>
> First partition is allocated for <- CCA device ->. I'd like to know if
> there is enough space and right type of partition, and, at all, is it
> right way to allocate <- CCA device ->.
>
> Second partition was formatted as GFS, so the question is: is selected
> type of partition right or not?
>
> ===================
>
> [root at hp1 root]# cat pool0.cfg
> poolname pool0
> minor 0 subpools 2
> subpool 0 0 1 gfs_journal
> pooldevice 0 0 /dev/cciss/c0d1p1
> subpool 1 0 1 gfs_data
> pooldevice 1 0 /dev/cciss/c0d1p2
>
>
> Actually, I don't know why, but first subpool I've made as gfs_journal 
> :-)
>
> Basically, the system works, but something may be wrong.
>
> ===================
> During setting up I've made this command:
>
> [root at hp1 root]# ccs_tool create /root/cluster/ /dev/cciss/c0d1p1
>
> [root at hp1 root]# pool_tool -s
>  Device                                            Pool Label
>  ======                                            ==========
>  /dev/cciss/c0d0                  <- partition information ->
>  /dev/cciss/c0d0p1                    <- EXT2/3 filesystem ->
>  /dev/cciss/c0d0p2                          <- swap device ->
>  /dev/cciss/c0d0p3                    <- EXT2/3 filesystem ->
>  /dev/cciss/c0d1                  <- partition information ->
>  /dev/cciss/c0d1p1                           <- CCA device ->
>  /dev/cciss/c0d1p2                       <- GFS filesystem ->
>
> I'd like to hear some comments on it.
>
> ===================
>
> Thanks.
>
> --
> Sergey Mikhnevich
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
>


From yazan at ccs.com.jo  Tue Jan 11 15:10:57 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Tue, 11 Jan 2005 17:10:57 +0200
Subject: [Linux-cluster] GFS probelm
Message-ID: <001901c4f7ef$c0a304d0$69050364@yazanz>

hi,

  i have RedHat enterprise linux ES v 3.0 Update 4 (the latest)

  and i have GFS version as :

   GFS-modules-smp-6.0.0-1.2
 GFS-6.0.0-0.6.i686

  when i installed the first rpm the system says that i must have the
 ( kernel-smp-2.4.21-15.EL.rpm )to run this.

  but i have a newer kernel by Update4.

  so ,can i found a solution for that problem ?
  and can i download a newer version of GFS to run with my updated kernel.

  cause i now have a newer updates that cannot work with the old gfs i am
using .

   ???????????.


Thank you
yazan.


From danderso at redhat.com  Tue Jan 11 19:05:17 2005
From: danderso at redhat.com (Derek Anderson)
Date: Tue, 11 Jan 2005 13:05:17 -0600
Subject: [Linux-cluster] GFS probelm
In-Reply-To: <001901c4f7ef$c0a304d0$69050364@yazanz>
References: <001901c4f7ef$c0a304d0$69050364@yazanz>
Message-ID: <200501111305.18337.danderso@redhat.com>

On Tuesday 11 January 2005 09:10, Yazan Al-Sheyyab wrote:
> hi,
>
>   i have RedHat enterprise linux ES v 3.0 Update 4 (the latest)
>
>   and i have GFS version as :
>
>    GFS-modules-smp-6.0.0-1.2
>  GFS-6.0.0-0.6.i686
>
>   when i installed the first rpm the system says that i must have the
>  ( kernel-smp-2.4.21-15.EL.rpm )to run this.
>
>   but i have a newer kernel by Update4.
>
>   so ,can i found a solution for that problem ?
>   and can i download a newer version of GFS to run with my updated kernel.

kernel-2.4.21-27  -- GFS-6.0.2-17
kernel-2.4.21-27.0.1  --  GFS-6.0.2-24

The latest GFS available on RHN matches the latest kernel available on RHN.

>
>   cause i now have a newer updates that cannot work with the old gfs i am
> using .
>
>    ???????????.
>
>
> Thank you
> yazan.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster


From bujan at isqsolutions.com  Tue Jan 11 19:58:09 2005
From: bujan at isqsolutions.com (Manuel Bujan)
Date: Tue, 11 Jan 2005 14:58:09 -0500
Subject: [Linux-cluster] GFS hang when one node fail
Message-ID: <009801c4f817$dfbe3f60$0c9ce142@pcbujan>

Hi,

Is there any possibility for a two-node GFS installation to continue working when one of the nodes fail abruptly?

Do I have to wait for the fence to be done and the failed system become operational again, to resume activity ?

We are using the latest CVS code in a 2.6.9 linux kernel.

Regards
Bujan

PD: We are already using CMAN with two_node="1" expected_votes="1"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050111/ebcc9771/attachment.htm>

From jbrassow at redhat.com  Wed Jan 12 00:47:44 2005
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Tue, 11 Jan 2005 18:47:44 -0600
Subject: [Linux-cluster] Multipath I/O
In-Reply-To: <834F55E6F1BE3B488AD3AFC927A0970018B873@EMAILSRV1.exad.net>
References: <834F55E6F1BE3B488AD3AFC927A0970018B873@EMAILSRV1.exad.net>
Message-ID: <922AA8AE-6433-11D9-85A8-000A957BB1F6@redhat.com>

GFS <= 6.0 will handle multipath automatically.

  brassow

On Jan 11, 2005, at 7:45 AM, Vincent Aniello wrote:

> Do I need to use the QLogic failover driver with GFS for multipath I/O 
> or does GFS handle multipath I/O on its own?
> ?
> Thanks for your input.
> ?
> --Vincent
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 595 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050111/00718595/attachment.bin>

From daniel at osdl.org  Wed Jan 12 01:00:46 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Tue, 11 Jan 2005 17:00:46 -0800
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <20050111085608.GA6645@tykepenguin.com>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050111085608.GA6645@tykepenguin.com>
Message-ID: <1105491645.30484.23.camel@ibm-c.pdx.osdl.net>

On Tue, 2005-01-11 at 00:56, Patrick Caulfield wrote:
> On Wed, Dec 22, 2004 at 09:33:39AM -0800, Daniel McNeil wrote:
> > How long does cman stay up in your testing?
> 
> With the higher pririty on the heartbeat thread I got 5 days before iSCSI died
> on me again... This isn't quite the same load as yours but it is on 8 busy nodes.

I have not seen 5 days yet on my set.  See my email from yesterday.
Is the code to have higher priority for the heartbeat thread
already checked in?  I restarted my test yesterday and it is
still going, but it usually has trouble after 50 hours or so.

Daniel


From Vincent.Aniello at PipelineTrading.com  Wed Jan 12 03:00:54 2005
From: Vincent.Aniello at PipelineTrading.com (Vincent Aniello)
Date: Tue, 11 Jan 2005 22:00:54 -0500
Subject: [Linux-cluster] Multipath I/O
Message-ID: <834F55E6F1BE3B488AD3AFC927A0970018B8D9@EMAILSRV1.exad.net>

So versions after 6.0 no longer handle multipath I/O automatically?
 
--Vincent

________________________________

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jonathan E
Brassow
Sent: Tuesday, January 11, 2005 7:48 PM
To: linux clistering
Subject: Re: [Linux-cluster] Multipath I/O


GFS <= 6.0 will handle multipath automatically. 

brassow 

On Jan 11, 2005, at 7:45 AM, Vincent Aniello wrote: 


	Do I need to use the QLogic failover driver with GFS for
multipath I/O or does GFS handle multipath I/O on its own? 
	Thanks for your input. 
	--Vincent 
	-- 
	Linux-cluster mailing list 
	Linux-cluster at redhat.com 
	http://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050111/bc3288ca/attachment.htm>

From teigland at redhat.com  Wed Jan 12 03:47:50 2005
From: teigland at redhat.com (David Teigland)
Date: Wed, 12 Jan 2005 11:47:50 +0800
Subject: [Linux-cluster] GFS hang when one node fail
In-Reply-To: <009801c4f817$dfbe3f60$0c9ce142@pcbujan>
References: <009801c4f817$dfbe3f60$0c9ce142@pcbujan>
Message-ID: <20050112034750.GA6184@redhat.com>

On Tue, Jan 11, 2005 at 02:58:09PM -0500, Manuel Bujan wrote:
> Hi,
> 
> Is there any possibility for a two-node GFS installation to continue
> working when one of the nodes fail abruptly?

yes

> Do I have to wait for the fence to be done 

yes, the remaining node will fence the failed node

> and the failed system become operational again, to resume activity ?

no, the remaining node will run fine on its own

> PD: We are already using CMAN with two_node="1" expected_votes="1"

that's correct

-- 
Dave Teigland  <teigland at redhat.com>


From teigland at redhat.com  Wed Jan 12 06:45:23 2005
From: teigland at redhat.com (David Teigland)
Date: Wed, 12 Jan 2005 14:45:23 +0800
Subject: [Linux-cluster] mount hang during test runs
In-Reply-To: <1105404619.30484.7.camel@ibm-c.pdx.osdl.net>
References: <1105404619.30484.7.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050112064523.GB7571@redhat.com>

On Mon, Jan 10, 2005 at 04:50:20PM -0800, Daniel McNeil wrote:

> I collected stack traces and a bunch of other info.  It is
> available here:
> http://developer.osdl.org/daniel/GFS/mount.hang.05jan2005/
> 
> Any ideas on debugging this one?


- Processes on cl032 and cl030 are blocked waiting for dlm responses from
  cl031.

- Processes on cl031 are blocked waiting for dlm responses to resource
  directory lookups (looking up unknown resource masters for 10,0 and 3,11).

- It looks like dlm_recvd may be stuck on cl031 preventing it from
  receiving the requests from the other two nodes and preventing it
  from receiving the responses to its own lookup requests.  This is
  probably the crux of the problem.  Unfortunately, all we see for
  dlm_recvd on cl031 (from stack.cl031) is:

dlm_recvd     R running     0 29053      6         29054 29052 (L-TLB)


cl032 - requesting PR on 10,1 (mounting)
----------------------------------------

lock_dlm2     D C170F414     0 18399      4               18398 (L-TLB)
e6a1fe04 00000046 e7639930 c170f414 0003e36e 00000018 00000008 00000000 
       d5ea8d58 7505db9d 0003e36e db8ff348 e6a1fdf8 e7639930 00000000 c170f8c0 
       c170ef60 00000000 000138a5 7505df29 0003e36e f4377170 f43772d8 00000000 
Call Trace:
 [<c03dbac4>] wait_for_completion+0xa4/0xe0
 [<f8968139>] lm_dlm_lock_sync+0x59/0x70 [lock_dlm]
 [<f8966163>] id_test_and_set+0xa3/0x260 [lock_dlm]
 [<f8966597>] claim_jid+0x47/0x120 [lock_dlm]
 [<f8966c3d>] process_start+0x46d/0x610 [lock_dlm]
 [<f896ca54>] dlm_async+0x274/0x3c0 [lock_dlm]
 [<c0134cca>] kthread+0xba/0xc0
 [<c0103325>] kernel_thread_helper+0x5/0x10


cl031 - requesting PR on 10,0
-----------------------------

lock_dlm1     D C170EF9C     0 29065      6         29066 29054 (L-TLB)
d2e0ede8 00000046 f76d3850 c170ef9c 0003e354 00000018 00000008 00000000 
       f6750838 30672ddf 0003e354 dbf900dc d2e0eddc f76d3850 00000000 c170f8c0 
       c170ef60 00000000 0002088a 306734a4 0003e354 f64d8710 f64d8878 00000000 
Call Trace:
 [<c03dbac4>] wait_for_completion+0xa4/0xe0
 [<f8968139>] lm_dlm_lock_sync+0x59/0x70 [lock_dlm]
 [<f8966443>] id_value+0x93/0x130 [lock_dlm]
 [<f896650f>] id_find+0x2f/0x70 [lock_dlm]
 [<f896670a>] discover_jids+0x6a/0xa0 [lock_dlm]
 [<f8966ab8>] process_start+0x2e8/0x610 [lock_dlm]
 [<f896ca54>] dlm_async+0x274/0x3c0 [lock_dlm]
 [<c0134cca>] kthread+0xba/0xc0
 [<c0103325>] kernel_thread_helper+0x5/0x10


cl031 - requesting NL on 3,11
-----------------------------

df            D 00000008     0 29088  29086                     (NOTLB)
dd0e5c14 00000082 dd0e5c04 00000008 00000001 f8b3b571 00000008 dd0e5c0c 
       ecb0a568 dbf9002c d6e5415c 00000008 dd0e5c44 00000018 00000000 00000000 
       c170ef60 00000000 00000fec 4d5f5234 0003e3a1 f6789190 f67892f8 dd0e5c44 
Call Trace:
 [<c03dbac4>] wait_for_completion+0xa4/0xe0
 [<f896804b>] do_dlm_lock_sync+0x4b/0x60 [lock_dlm]
 [<f89683d4>] hold_null_lock+0xb4/0xd0 [lock_dlm]
 [<f8968470>] lm_dlm_hold_lvb+0x40/0x50 [lock_dlm]
 [<f8afff2c>] gfs_lm_hold_lvb+0x3c/0x50 [gfs]
 [<f8af49a1>] gfs_lvb_hold+0x41/0xe0 [gfs]
 [<f8b19c13>] gfs_ri_update+0x1d3/0x250 [gfs]
 [<f8b19d78>] gfs_rindex_hold+0xe8/0x100 [gfs]
 [<f8b1d781>] gfs_stat_gfs+0x21/0x80 [gfs]
 [<f8b131e0>] gfs_statfs+0x30/0xd0 [gfs]
 [<c015e8ac>] vfs_statfs+0x4c/0x70
 [<c015e9cb>] vfs_statfs64+0x1b/0x50
 [<c015eb07>] sys_statfs64+0x67/0xa0
 [<c010537d>] sysenter_past_esp+0x52/0x71


cl030 - requesting PR on 10,1
-----------------------------

lock_dlm2     D 00000008     0 14338      6               14337 (L-TLB)
cf1b4de8 00000046 cf1b4dd8 00000008 00000001 00000018 00000008 00000000 
       f600ec98 00000000 00000000 cbe5ed24 cf1b4ddc 00000000 f7b82054 cf1b4df8 
       c170ef60 00000000 00014966 b62fc6b6 00009f97 f6610730 f6610898 00000009 
Call Trace:
 [<c03dbac4>] wait_for_completion+0xa4/0xe0
 [<f8b57139>] lm_dlm_lock_sync+0x59/0x70 [lock_dlm]
 [<f8b55443>] id_value+0x93/0x130 [lock_dlm]
 [<f8b5550f>] id_find+0x2f/0x70 [lock_dlm]
 [<f8b5570a>] discover_jids+0x6a/0xa0 [lock_dlm]
 [<f8b55ab8>] process_start+0x2e8/0x610 [lock_dlm]
 [<f8b5ba54>] dlm_async+0x274/0x3c0 [lock_dlm]
 [<c0134cca>] kthread+0xba/0xc0
 [<c0103325>] kernel_thread_helper+0x5/0x10


cl030 - requesting NL on 3,11
-----------------------------

df            D 00000008     0 14362  14360                     (NOTLB)
d10a3c14 00000086 d10a3c04 00000008 00000001 f8b3b571 00000008 d10a3c0c 
       f6b89818 cbe5ec74 c2015b28 00000008 d10a3c44 00000018 00000000 00000000 
       c170ef60 00000000 000305ef f0cf7f52 00009fe4 da6f0f10 da6f1078 d10a3c44 
Call Trace:
 [<c03dbac4>] wait_for_completion+0xa4/0xe0
 [<f8b5704b>] do_dlm_lock_sync+0x4b/0x60 [lock_dlm]
 [<f8b573d4>] hold_null_lock+0xb4/0xd0 [lock_dlm]
 [<f8b57470>] lm_dlm_hold_lvb+0x40/0x50 [lock_dlm]
 [<f8afff2c>] gfs_lm_hold_lvb+0x3c/0x50 [gfs]
 [<f8af49a1>] gfs_lvb_hold+0x41/0xe0 [gfs]
 [<f8b19c13>] gfs_ri_update+0x1d3/0x250 [gfs]
 [<f8b19d78>] gfs_rindex_hold+0xe8/0x100 [gfs]
 [<f8b1d781>] gfs_stat_gfs+0x21/0x80 [gfs]
 [<f8b131e0>] gfs_statfs+0x30/0xd0 [gfs]
 [<c015e8ac>] vfs_statfs+0x4c/0x70
 [<c015e9cb>] vfs_statfs64+0x1b/0x50
 [<c015eb07>] sys_statfs64+0x67/0xa0
 [<c010537d>] sysenter_past_esp+0x52/0x71


cl032 (nodeid 3, mounting and looking for free jid)
---------------------------------------------------

Resource dfdbf26c (parent 00000000). Name (len=24) "      10               1"  
Local Copy, Master is node 2
Granted Queue
Conversion Queue
Waiting Queue
000102aa -- (PR) Master:     00000000  LQ: 3,0x9 (pid 18399)


cl031 (nodeid 2, jid 1)
-----------------------

Resource cc0100a4 (parent 00000000). Name (len=24) "      10               1"  
Master Copy
LVB: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
Granted Queue
000100d5 PR (pid 29066)
Conversion Queue
Waiting Queue

Resource e16fe26c (parent 00000000). Name (len=24) "      10               0"  
Local Copy, Master is node -1
Granted Queue
Conversion Queue
Waiting Queue

Resource e4b5573c (parent 00000000). Name (len=24) "       3              11"  
Local Copy, Master is node -1
Granted Queue
Conversion Queue
Waiting Queue


cl030 (nodeid 1, jid 0)
-----------------------

Resource cfb9054c (parent 00000000). Name (len=24) "      10               0"  
Master Copy
LVB: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
Granted Queue
000102c3 PR (pid 14338)
Conversion Queue
Waiting Queue

Resource d798911c (parent 00000000). Name (len=24) "      10               1"  
Local Copy, Master is node 2
Granted Queue
Conversion Queue
Waiting Queue
000103b7 -- (PR) Master:     00000000  LQ: 3,0x9 (pid 14338)

Resource d38d7b2c (parent 00000000). Name (len=24) "       3              11"  
Local Copy, Master is node 2
Granted Queue
Conversion Queue
Waiting Queue
0002022e -- (NL) Master:     00000000  LQ: 3,0x8 (pid 14362)


-- 
Dave Teigland  <teigland at redhat.com>


From pcaulfie at redhat.com  Wed Jan 12 08:58:12 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 12 Jan 2005 08:58:12 +0000
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <1105491645.30484.23.camel@ibm-c.pdx.osdl.net>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050111085608.GA6645@tykepenguin.com>
	<1105491645.30484.23.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050112085812.GI6645@tykepenguin.com>

On Tue, Jan 11, 2005 at 05:00:46PM -0800, Daniel McNeil wrote:
> On Tue, 2005-01-11 at 00:56, Patrick Caulfield wrote:
> > On Wed, Dec 22, 2004 at 09:33:39AM -0800, Daniel McNeil wrote:
> > > How long does cman stay up in your testing?
> > 
> > With the higher pririty on the heartbeat thread I got 5 days before iSCSI died
> > on me again... This isn't quite the same load as yours but it is on 8 busy nodes.
> 
> I have not seen 5 days yet on my set.  See my email from yesterday.
> Is the code to have higher priority for the heartbeat thread
> already checked in?  I restarted my test yesterday and it is
> still going, but it usually has trouble after 50 hours or so.
> 

It's rev 1.45 of membership.c checked in on the 7th Jan. If that hasn't fixed it
I'll have to dabble with realtime things as it does seem now that the threads
are not being woken up, even though the timer is firing.

-- 

patrick


From ptr at poczta.fm  Wed Jan 12 10:38:49 2005
From: ptr at poczta.fm (ptr at poczta.fm)
Date: 12 Jan 2005 11:38:49 +0100
Subject: [Linux-cluster] Log entry
Message-ID: <20050112103849.3DD8B3031E3@poczta.interia.pl>


   Hello.
I'm receiving entries like the one below in my system logs.
It's 2-nodes cluster  built form CVS.

-node1:

dlm: lkb
id 52cd01b3
remid 4c730361
flags 0
status 3
rqmode 5
grmode 3
nodeid 1
lqstate 2
lqflags 44
dlm: request
rh_cmd 6
rh_lkid 4c730361
remlkid 52cd01b3
flags 0
status 0
rqmode 3
dlm: eva: process_lockqueue_reply id 52cd01b3 state 0

-node2:

dlm: lkb
id 43010219
remid 48330092
flags 0
status 3
rqmode 5
grmode 3
nodeid 2
lqstate 2
lqflags 44
dlm: request
rh_cmd 6
rh_lkid 48330092
remlkid 43010219
flags 0
status 0
rqmode 3
dlm: eva: process_lockqueue_reply id 43010219 state 0

Can someone explain what kind of faults are they?
   Regards,

Piotr

----------------------------------------------------------------------
Najlepsze auto, najlepsze moto... >>> http://link.interia.pl/f1841


From teigland at redhat.com  Wed Jan 12 11:56:06 2005
From: teigland at redhat.com (David Teigland)
Date: Wed, 12 Jan 2005 19:56:06 +0800
Subject: [Linux-cluster] Log entry
In-Reply-To: <20050112103849.3DD8B3031E3@poczta.interia.pl>
References: <20050112103849.3DD8B3031E3@poczta.interia.pl>
Message-ID: <20050112115606.GA12401@redhat.com>

On Wed, Jan 12, 2005 at 11:38:49AM +0100, ptr at poczta.fm wrote:
> 
>    Hello.
> I'm receiving entries like the one below in my system logs.
> It's 2-nodes cluster  built form CVS.
>
> -node1:
> 
> dlm: lkb
> id 52cd01b3
> remid 4c730361
> flags 0
> status 3
> rqmode 5
> grmode 3
> nodeid 1
> lqstate 2
> lqflags 44
> dlm: request
> rh_cmd 6
> rh_lkid 4c730361
> remlkid 52cd01b3
> flags 0
> status 0
> rqmode 3
> dlm: eva: process_lockqueue_reply id 52cd01b3 state 0

> Can someone explain what kind of faults are they?

They are a notice that an unexplained message reordering has taken place
and been corrected.  The log entries can be ignored.

-- 
Dave Teigland  <teigland at redhat.com>


From mshk_00 at hotmail.com  Wed Jan 12 11:59:56 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Wed, 12 Jan 2005 12:59:56 +0100
Subject: [Linux-cluster] mount file system GFS
Message-ID: <BAY20-F3541DDFC51016BE075F1C28D890@phx.gbl>

I follow the basic example C.3 of the administration Guide of Red Hat GFS 
6.0. :" LOCK_GULM SLM Embedded" with only two nodes to access a one file 
system shared that resides in a SAN (MJ) via iscsi.
I have installed red hat enterprise 3.0, kernel 2.4.21-15.0.4.EL, with the 
modules for GFS-6.0.0-7.
All go right (pools created and activated, cca created, ccsd launched, file 
system created with gfs_mkfs) except I can not mount the file system, the 
command mount not recognize the file system type gfs, the message appears on 
the console is:
    mount : type file system incorrect, option incorrect, superblock 
incorrect in /dev/pool/pool_gfs01 or number of file systems mounted 
excessive
/dev/pool/pool_gfs01 is pool created for file system, the device assigned is 
/dev/sdd2
Why can not mount the file system? what is wrong?

_________________________________________________________________
Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. 
Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil


From mtilstra at redhat.com  Wed Jan 12 14:01:27 2005
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Wed, 12 Jan 2005 08:01:27 -0600
Subject: [Linux-cluster] mount file system GFS
In-Reply-To: <BAY20-F3541DDFC51016BE075F1C28D890@phx.gbl>
References: <BAY20-F3541DDFC51016BE075F1C28D890@phx.gbl>
Message-ID: <20050112140127.GA2807@redhat.com>

On Wed, Jan 12, 2005 at 12:59:56PM +0100, maria perez wrote:
> I follow the basic example C.3 of the administration Guide of Red Hat GFS 
> 6.0. :" LOCK_GULM SLM Embedded" with only two nodes to access a one file 
> system shared that resides in a SAN (MJ) via iscsi.
> I have installed red hat enterprise 3.0, kernel 2.4.21-15.0.4.EL, with the 
> modules for GFS-6.0.0-7.
> All go right (pools created and activated, cca created, ccsd launched, file 
> system created with gfs_mkfs) except I can not mount the file system, the 
> command mount not recognize the file system type gfs, the message appears 
> on the console is:
>    mount : type file system incorrect, option incorrect, superblock 
> incorrect in /dev/pool/pool_gfs01 or number of file systems mounted 
> excessive
> /dev/pool/pool_gfs01 is pool created for file system, the device assigned 
> is /dev/sdd2
> Why can not mount the file system? what is wrong?

run dmesg to get more info about why it cannot mount.

Did you remember to start lock_gulmd?

-- 
Michael Conrad Tadpol Tilstra
The bug starts here.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050112/a135506b/attachment.sig>

From serge at triumvirat.ru  Wed Jan 12 14:10:40 2005
From: serge at triumvirat.ru (Sergey)
Date: Wed, 12 Jan 2005 17:10:40 +0300
Subject: [Linux-cluster] some questions about setting up GFS
In-Reply-To: <7B452B98-63F6-11D9-85A8-000A957BB1F6@redhat.com>
References: <1125914338.20050111113424@triumvirat.ru>
	<7B452B98-63F6-11D9-85A8-000A957BB1F6@redhat.com>
Message-ID: <1814439859.20050112171040@triumvirat.ru>

Hello!

> It looks like you are not using pool.

Thanks, I've guided by your examples, so raid can be mounted.

Now I have some questions about Cluster Configuration System Files.

I have 2 nodes - hp1 and hp2. Any of nodes have Integrated Lights-Out
with ROM Version: 1.55 - 04/16/2004.

Since I have only 2 nodes one of them has to be master, but if first
of them (master) is correctly shut down, slave experiencing
serious problems which can be solved by resetting. Is it all right?
How to make it right?

I tried to make servers = ["hp1","hp2","hp3"] (hp3 is really absent),
then if master is shut down second node became master. So, if
nodes are alternately correctly shut down and boot up master is
switching from one to another and everything seems ok, but if one of
the nodes is shut down incorrectly (e.g. power cord is pulled out of
socket), this have written in systemlog:

Jan 12 14:44:33 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530273952756 mb:1)
Jan 12 14:44:48 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530288972780 mb:2)
Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530303992751 mb:3)
Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Client (hp2) expired
Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Core lost slave quorum. Have 1, need 2. Switching to Arbitrating.
Jan 12 14:45:03 hp1 lock_gulmd_core[6614]: Gonna exec fence_node hp2
Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Forked [6614] fence_node hp2 with a 0 pause.
Jan 12 14:45:03 hp1 fence_node[6614]: Performing fence method, riloe, on hp2.
Jan 12 14:45:04 hp1 fence_node[6614]: The agent (fence_rib) reports:
Jan 12 14:45:04 hp1 fence_node[6614]: WARNING!  fence_rib is deprecated.  use fence_ilo instead parse error: unknown
option "ipaddr=10.10.0.112"

If start again service lock_gulm on the second node, then on first
node this have written in systemlog:

Jan 12 14:50:14 hp1 lock_gulmd_core[7148]: Gonna exec fence_node hp2
Jan 12 14:50:14 hp1 fence_node[7148]: Performing fence method, riloe, on hp2.
Jan 12 14:50:14 hp1 fence_node[7148]: The agent (fence_rib) reports:
Jan 12 14:50:14 hp1 fence_node[7148]: WARNING!  fence_rib is deprecated.  use fence_ilo instead parse error: unknown
option "ipaddr=10.10.0.112"
Jan 12 14:50:14 hp1 fence_node[7148]:
Jan 12 14:50:14 hp1 fence_node[7148]: All fencing methods FAILED!
Jan 12 14:50:14 hp1 fence_node[7148]: Fence of "hp2" was unsuccessful.
Jan 12 14:50:14 hp1 lock_gulmd_core[6500]: Fence failed. [7148] Exit code:1 Running it again.
Jan 12 14:50:14 hp1 lock_gulmd_core[6500]: Forked [7157] fence_node hp2 with a 5 pause.
Jan 12 14:50:15 hp1 lock_gulmd_core[6500]:  (10.10.0.201:hp2) Cannot login if you are expired.

And I can't umount GFS file system and can't reboot systems
because GFS is mounted, only reset both nodes.

I think I have mistakes in my configuration, may be it is because
incorrect agent = "fence_rib" or something else.

Please help :-)


Cluster Configuration:

cluster.ccs:
cluster {
         name = "cluster"
         lock_gulm {
             servers = ["hp1"]    (or servers = ["hp1,"hp2","hp3"])
         }
}

fence.ccs:
fence_devices {
                ILO-HP1 {
                        agent = "fence_rib"
                        ipaddr = "10.10.0.111"
                        login = "xx"
                        passwd = "xx"
                        }
                ILO-HP2 {
                        agent = "fence_rib"
                        ipaddr = "10.10.0.112"
                        login = "xx"
                        passwd = "xx"
                        }
            }

nodes.ccs:
nodes {
      hp1 {
          ip_interfaces { eth0 = "10.10.0.200" }
          fence { riloe { ILO-HP1 { localport = 17988 } } }
          }
      hp2 {
          ip_interfaces { eth0 = "10.10.0.201" }
          fence { riloe { ILO-HP2 { localport = 17988 } } }
          }
# if 3 nodes in cluster.ccs
#      hp3 {
#          ip_interfaces { eth0 = "10.10.0.201" }
#          fence { riloe { ILO-HP2 { localport = 17988 } } }
#          }


Thanks a lot anyway!
-- 
 Sergey


From mtilstra at redhat.com  Wed Jan 12 14:49:05 2005
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Wed, 12 Jan 2005 08:49:05 -0600
Subject: [Linux-cluster] some questions about setting up GFS
In-Reply-To: <1814439859.20050112171040@triumvirat.ru>
References: <1125914338.20050111113424@triumvirat.ru>
	<7B452B98-63F6-11D9-85A8-000A957BB1F6@redhat.com>
	<1814439859.20050112171040@triumvirat.ru>
Message-ID: <20050112144905.GA3029@redhat.com>

On Wed, Jan 12, 2005 at 05:10:40PM +0300, Sergey wrote:
> Hello!
> 
> > It looks like you are not using pool.
> 
> Thanks, I've guided by your examples, so raid can be mounted.
> 
> Now I have some questions about Cluster Configuration System Files.
> 
> I have 2 nodes - hp1 and hp2. Any of nodes have Integrated Lights-Out
> with ROM Version: 1.55 - 04/16/2004.
> 
> Since I have only 2 nodes one of them has to be master, but if first
> of them (master) is correctly shut down, slave experiencing
> serious problems which can be solved by resetting. Is it all right?
> How to make it right?
> 
> I tried to make servers = ["hp1","hp2","hp3"] (hp3 is really absent),
> then if master is shut down second node became master. So, if

The nodes in the servers config line for gulm form a mini-cluster of
sorts.  There must be quorum (51%) of nodes present in this mini-cluster
for things to continue.

You must have two of the three servers up and running so that the
mini-cluster has quorum, which then will alow the other nodes to
connect.

> nodes are alternately correctly shut down and boot up master is
> switching from one to another and everything seems ok, but if one of
> the nodes is shut down incorrectly (e.g. power cord is pulled out of
> socket), this have written in systemlog:
> 
> Jan 12 14:44:33 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530273952756 mb:1)
> Jan 12 14:44:48 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530288972780 mb:2)
> Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530303992751 mb:3)
> Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Client (hp2) expired
> Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Core lost slave quorum. Have 1, need 2. Switching to Arbitrating.
> Jan 12 14:45:03 hp1 lock_gulmd_core[6614]: Gonna exec fence_node hp2
> Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Forked [6614] fence_node hp2 with a 0 pause.
> Jan 12 14:45:03 hp1 fence_node[6614]: Performing fence method, riloe, on hp2.
> Jan 12 14:45:04 hp1 fence_node[6614]: The agent (fence_rib) reports:
> Jan 12 14:45:04 hp1 fence_node[6614]: WARNING!  fence_rib is deprecated.  use fence_ilo instead parse error: unknown
> option "ipaddr=10.10.0.112"
> 
> If start again service lock_gulm on the second node, then on first
> node this have written in systemlog:
> 
> Jan 12 14:50:14 hp1 lock_gulmd_core[7148]: Gonna exec fence_node hp2
> Jan 12 14:50:14 hp1 fence_node[7148]: Performing fence method, riloe, on hp2.
> Jan 12 14:50:14 hp1 fence_node[7148]: The agent (fence_rib) reports:
> Jan 12 14:50:14 hp1 fence_node[7148]: WARNING!  fence_rib is deprecated.  use fence_ilo instead parse error: unknown
> option "ipaddr=10.10.0.112"
> Jan 12 14:50:14 hp1 fence_node[7148]:
> Jan 12 14:50:14 hp1 fence_node[7148]: All fencing methods FAILED!
> Jan 12 14:50:14 hp1 fence_node[7148]: Fence of "hp2" was unsuccessful.
> Jan 12 14:50:14 hp1 lock_gulmd_core[6500]: Fence failed. [7148] Exit code:1 Running it again.
> Jan 12 14:50:14 hp1 lock_gulmd_core[6500]: Forked [7157] fence_node hp2 with a 5 pause.
> Jan 12 14:50:15 hp1 lock_gulmd_core[6500]:  (10.10.0.201:hp2) Cannot login if you are expired.

The node hp2 has to be successfully fenced before it is allowed to
re-join the cluster.  If your fencing is misconfigured or not working, a
fenced node will never get to rejoin.

You really should test that fencing works by running 
fence_node <node name> for each node in your cluster before running
lock_gulmd.  This makes sure that fencing is setup and working
correctly.

Do that, and once you've verified that fencing is correct (without
lock_gulmd running) try things again with lock_gulmd.

> And I can't umount GFS file system and can't reboot systems
> because GFS is mounted, only reset both nodes.
> 
> I think I have mistakes in my configuration, may be it is because
> incorrect agent = "fence_rib" or something else.
> 
> Please help :-)

> 
> 
> Cluster Configuration:
> 
> cluster.ccs:
> cluster {
>          name = "cluster"
>          lock_gulm {
>              servers = ["hp1"]    (or servers = ["hp1,"hp2","hp3"])
>          }
> }
> 
> fence.ccs:
> fence_devices {
>                 ILO-HP1 {
>                         agent = "fence_rib"
>                         ipaddr = "10.10.0.111"
>                         login = "xx"
>                         passwd = "xx"
>                         }
>                 ILO-HP2 {
>                         agent = "fence_rib"
>                         ipaddr = "10.10.0.112"
>                         login = "xx"
>                         passwd = "xx"
>                         }
>             }
> 
> nodes.ccs:
> nodes {
>       hp1 {
>           ip_interfaces { eth0 = "10.10.0.200" }
>           fence { riloe { ILO-HP1 { localport = 17988 } } }
>           }
>       hp2 {
>           ip_interfaces { eth0 = "10.10.0.201" }
>           fence { riloe { ILO-HP2 { localport = 17988 } } }
>           }
> # if 3 nodes in cluster.ccs
> #      hp3 {
> #          ip_interfaces { eth0 = "10.10.0.201" }
> #          fence { riloe { ILO-HP2 { localport = 17988 } } }
> #          }

-- 
Michael Conrad Tadpol Tilstra
Hi, I'm an evil mutated signature virus, put me in your .sig or I will
bite your kneecaps!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050112/855c8f24/attachment.sig>

From mshk_00 at hotmail.com  Wed Jan 12 11:12:33 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Wed, 12 Jan 2005 12:12:33 +0100
Subject: [Linux-cluster] Mount GFS
Message-ID: <BAY20-F36CB662C861CBE2F8C892F8D890@phx.gbl>

I follow Basic example GFS C.3 (administration guide Red hat GFS 6.0) 
LOCK_GULM SLM Embedded  with only two nodes and only a file system, one of 
them running as server LOCK_GULM. My shared storage is a SAN, MJ, I access 
it via iscsi.
All go right, except I can not mount the file system gfs, mount not 
recognize the type of fyle system gfs, the message appear on the console is:
    mount : file system type incorrect, option incorrect, superblock 
incorrect in /dev/pool/pool_gfs01 or number of file systems mounted 
excessive

Why??

_________________________________________________________________
Moda para esta temporada. Ponte al d?a de todas las tendencias. 
http://www.msn.es/Mujer/moda/default.asp


From daniel at osdl.org  Wed Jan 12 17:44:22 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Wed, 12 Jan 2005 09:44:22 -0800
Subject: [Linux-cluster] cman bad generation number
In-Reply-To: <20050112085812.GI6645@tykepenguin.com>
References: <1103654081.29749.17.camel@ibm-c.pdx.osdl.net>
	<20041222090832.GB1260@tykepenguin.com>
	<1103736819.30947.20.camel@ibm-c.pdx.osdl.net>
	<20050111085608.GA6645@tykepenguin.com>
	<1105491645.30484.23.camel@ibm-c.pdx.osdl.net>
	<20050112085812.GI6645@tykepenguin.com>
Message-ID: <1105551862.30484.31.camel@ibm-c.pdx.osdl.net>

On Wed, 2005-01-12 at 00:58, Patrick Caulfield wrote:
> On Tue, Jan 11, 2005 at 05:00:46PM -0800, Daniel McNeil wrote:
> > On Tue, 2005-01-11 at 00:56, Patrick Caulfield wrote:
> > > On Wed, Dec 22, 2004 at 09:33:39AM -0800, Daniel McNeil wrote:
> > > > How long does cman stay up in your testing?
> > > 
> > > With the higher pririty on the heartbeat thread I got 5 days before iSCSI died
> > > on me again... This isn't quite the same load as yours but it is on 8 busy nodes.
> > 
> > I have not seen 5 days yet on my set.  See my email from yesterday.
> > Is the code to have higher priority for the heartbeat thread
> > already checked in?  I restarted my test yesterday and it is
> > still going, but it usually has trouble after 50 hours or so.
> > 
> 
> It's rev 1.45 of membership.c checked in on the 7th Jan. If that hasn't fixed it
> I'll have to dabble with realtime things as it does seem now that the threads
> are not being woken up, even though the timer is firing.

I'm running from code as of Jan 4th, so I do not have that change.
I'll updated my code.

2 nodes died last night running my tests with
echo "9" > /proc/cluster/config/cman/max_retries
echo "1" > /proc/cluster/config/cman/hello_timer

here's the output on the console from the 3 nodes:

cl030:
CMAN: no HELLO from cl031a, removing from the cluster
CMAN: node cl032a is not responding - removing from the cluster
CMAN: quorum lost, blocking activity

cl031:
CMAN: node cl030a is not responding - removing from the cluster
CMAN: node cl032a is not responding - removing from the cluster
                                                                                
SM:  Assertion failed on line 67 of file
/Views/redhat-cluster/cluster/cman-kernel/src/sm_membership.c
SM:  assertion:  "node"
SM:  time = 115176056
                                                                                
Kernel panic - not syncing: SM:  Record message above and reboot.
                                                                                
Message from syslogd at cl031 at Wed Jan 12 01:17:57 2005 ...
Record message above and reboot. syncing: SM:

cl032:
CMAN: too many transition restarts - will die

Daniel


From yazan at ccs.com.jo  Wed Jan 12 18:03:48 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Wed, 12 Jan 2005 20:03:48 +0200
Subject: [Linux-cluster] 3 questions
Message-ID: <000b01c4f8d1$10c00710$69050364@yazanz>

hi,    I have 3 question :

1- should i setup the temporary directory for GFS configuration files 
     on the two nodes or only on one node ?

_______________________________________________________
2- and if on the two nodes, should i run the :
      ( ccs_tool create.... ) command on the two nodes or only from one ?

_______________________________________________________


3- I have two members , and have build the cluster.ccs file as follow:

  cluster {
   name = "oracluster"
   lock_gulm {
         servers = [ "orat1"]
   }
}


  should i put the two members in the servers line or only 
  the first ( because the document example is about 4 nodes 
  and it had put only the first 3 nodes name)

   is that true or what ???????????//


Thanks
Yazan.


From amanthei at redhat.com  Wed Jan 12 18:04:16 2005
From: amanthei at redhat.com (Adam Manthei)
Date: Wed, 12 Jan 2005 12:04:16 -0600
Subject: [Linux-cluster] some questions about setting up GFS
In-Reply-To: <1814439859.20050112171040@triumvirat.ru>
References: <1125914338.20050111113424@triumvirat.ru>
	<7B452B98-63F6-11D9-85A8-000A957BB1F6@redhat.com>
	<1814439859.20050112171040@triumvirat.ru>
Message-ID: <20050112180416.GC32421@redhat.com>

On Wed, Jan 12, 2005 at 05:10:40PM +0300, Sergey wrote:
> I have 2 nodes - hp1 and hp2. Any of nodes have Integrated Lights-Out
> with ROM Version: 1.55 - 04/16/2004.
> 
> Jan 12 14:45:04 hp1 fence_node[6614]: The agent (fence_rib) reports:
> Jan 12 14:45:04 hp1 fence_node[6614]: WARNING!  fence_rib is deprecated.  use fence_ilo instead parse error: unknown
> option "ipaddr=10.10.0.112"

Two things:
1. This is telling you to use an updated version of the agent, fence_ilo.
   replace fence_rib w/ fence_ilo in your ccs files

2. "ipaddr" is not a parameter for either fence_ilo or fence_rib.  The
   correct parameter is "hostname" (as described in the man page).

Hint:  You will also need perl-Crypt-SSLeay package from RHN or Net::SSLeay
from CPAN.

> Cluster Configuration:
> 
> cluster.ccs:
> cluster {
>          name = "cluster"
>          lock_gulm {
>              servers = ["hp1"]    (or servers = ["hp1,"hp2","hp3"])
>          }
> }
> 
> fence.ccs:
> fence_devices {
>                 ILO-HP1 {
>                         agent = "fence_rib"
>                         ipaddr = "10.10.0.111"
>                         login = "xx"
>                         passwd = "xx"
>                         }
>                 ILO-HP2 {
>                         agent = "fence_rib"
>                         ipaddr = "10.10.0.112"
>                         login = "xx"
>                         passwd = "xx"
>                         }
>             }
> 
> nodes.ccs:
> nodes {
>       hp1 {
>           ip_interfaces { eth0 = "10.10.0.200" }
>           fence { riloe { ILO-HP1 { localport = 17988 } } }
>           }
>       hp2 {
>           ip_interfaces { eth0 = "10.10.0.201" }
>           fence { riloe { ILO-HP2 { localport = 17988 } } }
>           }

-- 
Adam Manthei  <amanthei at redhat.com>


From sbasto at fe.up.pt  Wed Jan 12 19:35:24 2005
From: sbasto at fe.up.pt (=?ISO-8859-1?Q?S=E9rgio?= M. Basto)
Date: Wed, 12 Jan 2005 19:35:24 +0000
Subject: [Linux-cluster] other log entry
Message-ID: <1105558524.8704.8.camel@rh10.fe.up.pt>

Hi, 
with redhat AS 3 update3 I got on /var/log/messages

          04:02:48 samba-gfs lock_gulmd_core[10523]: "STONITH<->GuLM
Bridge" is logged out. fd:9
Jan  9 04:04:53 samba-gfs last message repeated 4 times
Jan  9 04:06:58 samba-gfs last message repeated 4 times
Jan  9 04:09:03 samba-gfs last message repeated 4 times
Jan  9 04:11:08 samba-gfs last message repeated 4 times
Jan  9 04:13:13 samba-gfs last message repeated 4 times
Jan  9 04:15:18 samba-gfs last message repeated 4 times

are this normal ?
what this means ? 

google don't find any !

thanks,
-- 
S?rgio M. B.


From mmiller at cruzverde.cl  Wed Jan 12 20:28:09 2005
From: mmiller at cruzverde.cl (Markus Miller)
Date: Wed, 12 Jan 2005 17:28:09 -0300
Subject: [Linux-cluster] RAW device limits
Message-ID: <75E9203E0F0DD511B37E00D0B789D45007E835AA@fcv-stgo.cverde.cl>

Hi,

I found a document on the oracle web site ...
http://oss.oracle.com/projects/ocfs/dist/documentation/RHAS_best_practices.html
.. that says that the maximum number of RAW devices supported by Red Hat AS 2.1 is 255. Does anybody know, if this limit still exists in Red Hat Enterprise 3?

I searched the Internet and found all kinds of limits (file size, filesystem size ...) but nothing about the amount of RAW devices soported in Red Hat Enterprise 3.

Regards,
Markus


Markus Miller
Ingeniero de Sistemas, RHCE
DIFARMA
Lord Cochrane 326, Santiago, Chile
Tel. +56 2 6944076
mmiller at cruzverde.cl


From rajkum2002 at rediffmail.com  Wed Jan 12 23:49:10 2005
From: rajkum2002 at rediffmail.com (Raj  Kumar)
Date: 12 Jan 2005 23:49:10 -0000
Subject: [Linux-cluster] 3 questions
Message-ID: <20050112234910.19523.qmail@webmail28.rediffmail.com>

>1- should i setup the temporary directory for GFS configuration files
>      on the two nodes or only on one node ?
ONE>


>_______________________________________________________
>2- and if on the two nodes, should i run the :
>       ( ccs_tool create.... ) command on the two nodes or only from one ?
>
>_______________________________________________________
>
ONLY FROM ONE NODE. THIS IS KIND OF GFS SETUP WHICH HAS TO BE DONE ONCE AND SO YOU RUN IT FROM ONLY ONE NODE.

>3- I have two members , and have build the cluster.ccs file as follow:
>
>   cluster {
>    name = "oracluster"
>    lock_gulm {
>          servers = [ "orat1"]
>    }
>}

THIS ENTRY INDICATES THE NODES RUNNING LOCK SERVERS (LOCK_GULMD). SINCE YOU HAVE ONLY TWO NODES YOU CAN RUN ONLY ONE LOCK SERVER AND YOU WILL PUT THE NAME OF THE NODE RUNNING LOCK SERVER HERE.

From GFS manual:

Because of quorum requirements, the number of lock servers allowed in a GFS cluster can be 1, 3, 4, or 5. Any other number of lock servers that is, 0, 2, or more than 5 is not supported.

Hope this helps!
Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050112/48158dcf/attachment.htm>

From daniel at osdl.org  Thu Jan 13 00:47:59 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Wed, 12 Jan 2005 16:47:59 -0800
Subject: [Linux-cluster] Clusters special interest group (SIG)
Message-ID: <1105577279.5655.3.camel@ibm-c.pdx.osdl.net>

The purpose of the Clusters SIG is to provide a general Linux clusters
forum which is not specific to any one cluster project. Most of the
discussion takes place on the clusters_sig at osdl.org mailing list.

For information, the web page is here:
(http://developer.osdl.org/dev/clusters/)

To sign up for mailing list:
http://lists.osdl.org/mailman/listinfo/clusters_sig

Initial topics will most likely be (still up for discussion):

   - Common kernel components 
      * Code review for kernel hooks needed for in-kernel cluster
        services.
  * Sharing of common features between cluster implementations 
      * Fencing mechanisms
      * Resource management
      * Other cluster components (DLM, Membership, communication, etc)
  * SA Forum interfaces
  * OSDL working group capabilities/requirements
  * Customer and developer feedback on how open source clustering is
    being used and features that are needed or lacking.

Daniel


From mshk_00 at hotmail.com  Thu Jan 13 11:28:15 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Thu, 13 Jan 2005 12:28:15 +0100
Subject: Re Re: [Linux-cluster] mount file system GFS
Message-ID: <BAY20-F427CC9EB24DFD33F898CE88D8A0@phx.gbl>


   >Michael Conrad Tadpol Tilstra

   >run dmesg to get more info about why it cannot mount.

  >Did you remember to start lock_gulmd?

Certainly, thank you very much!
But now I have another problem.:Only the node stablished like server 
lock_gulm can mount the file system, the second node hang. Why??
The nodes' names are different and each node  the file /etc/hosts contains:

    127.0.0.1    localhost.localdomain      localhost
    127.0.0.1    machinename.domain    machinename
    (ip machine)   machinename.domain   machinename

I have another question: I have read the number of lock_gulm servers only 
can be 1, 3 or 5, not 2. Is right?
I am thinking established the two nodes like servers lock_gulm, will be this 
correct?

_________________________________________________________________
Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos 
incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos


From yazan at ccs.com.jo  Thu Jan 13 12:31:45 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Thu, 13 Jan 2005 14:31:45 +0200
Subject: [Linux-cluster] cluster suite question?
Message-ID: <000d01c4f96b$d8096dd0$69050364@yazanz>


hi,
 
  i am working from the beggining again.

 should i install the cluster suite on the two nodes or only on one node??

 Thanks.


From yazan at ccs.com.jo  Thu Jan 13 13:15:26 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Thu, 13 Jan 2005 15:15:26 +0200
Subject: [Linux-cluster] gfs probelm
Message-ID: <001501c4f971$f294df80$69050364@yazanz>


HI,

  I configure the gfs , and i mount the partitioned as gfs as mentioned in
the document, but when make a reboot, the system halted and stay ask
continuously :  lock_glumd is it running.

 i didnt put the mounted gfs partitions in the /etc/fstab.  ( is that true
?)

i made a shell and i put in it the following :
  service ccsd stop
  service lock_gulmd stop

  and i execut it before i make a reboot, and when i loged again to the
system the two services are running by the system, that is ok , i know it is
not a solution , BUT in the second reboot i found that the system gives the
same continuous error question ( lock_gulmd is it running? ).

how can i solve this ?
can i put the partitions in the /etc/fstab ?

OR WHAT ?????.

Thanks.


From serge at triumvirat.ru  Thu Jan 13 13:40:23 2005
From: serge at triumvirat.ru (Sergey)
Date: Thu, 13 Jan 2005 16:40:23 +0300
Subject: [Linux-cluster] some questions about setting up GFS
In-Reply-To: <20050112144905.GA3029@redhat.com>
References: <1125914338.20050111113424@triumvirat.ru>
	<7B452B98-63F6-11D9-85A8-000A957BB1F6@redhat.com>
	<1814439859.20050112171040@triumvirat.ru>
	<20050112144905.GA3029@redhat.com>
Message-ID: <2310360088.20050113164023@triumvirat.ru>


>> I have 2 nodes - hp1 and hp2. Any of nodes have Integrated Lights-Out
>> with ROM Version: 1.55 - 04/16/2004.
>> 

> The nodes in the servers config line for gulm form a mini-cluster of
> sorts.  There must be quorum (51%) of nodes present in this mini-cluster
> for things to continue.

> You must have two of the three servers up and running so that the
> mini-cluster has quorum, which then will alow the other nodes to
> connect.

I have only 2 nodes and I can't get quorum. Should I use Single Lock
Manager (SLM), when one node is master and another is slave?

But in this case if master goes down slave loses access to common file
system, and it systemlog looks like this:

Jan 13 15:56:59 hp2 kernel: lock_gulm: Checking for journals for node "hp1"
Jan 13 15:56:59 hp2 lock_gulmd_core[2935]: Master Node has logged out.
Jan 13 15:56:59 hp2 kernel: lock_gulm: Checking for journals for node "hp1"
Jan 13 15:56:59 hp2 lock_gulmd_core[2935]: In core_io.c:410 (v6.0.0) death by: Lost connection to SLM Master (hp1),
stopping. node reset required to re-activate cluster operations.
Jan 13 15:56:59 hp2 kernel: lock_gulm: ERROR Got an error in gulm_res_recvd err: -71
Jan 13 15:56:59 hp2 lock_gulmd_LTPX[2941]: EOF on xdr (_ core _:0.0.0.0 idx:1 fd:5)
Jan 13 15:56:59 hp2 lock_gulmd_LTPX[2941]: In ltpx_io.c:335 (v6.0.0) death by: Lost connection to core, cannot
continue. node reset required to re-activate cluster operations.
Jan 13 15:56:59 hp2 kernel: lock_gulm: ERROR gulm_LT_recver err -71
Jan 13 15:57:02 hp2 kernel: lock_gulm: ERROR Got a -111 trying to login to lock_gulmd.  Is it running?


status of lock_gulmd:

[root at hp2 root]# /etc/init.d/lock_gulmd status
lock_gulmd dead but subsys locked

If master boots up after some time happens nothing - slave does not
try to connect.

What should happens further and in what order?


> You really should test that fencing works by running 
> fence_node <node name> for each node in your cluster before running
> lock_gulmd.  This makes sure that fencing is setup and working
> correctly.

> Do that, and once you've verified that fencing is correct (without
> lock_gulmd running) try things again with lock_gulmd.

Result of command
fence_node NODENAME
is reboot of NODENAME. Is it right?


--
Sergey


From mtilstra at redhat.com  Thu Jan 13 14:10:26 2005
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Thu, 13 Jan 2005 08:10:26 -0600
Subject: Re Re: [Linux-cluster] mount file system GFS
In-Reply-To: <BAY20-F427CC9EB24DFD33F898CE88D8A0@phx.gbl>
References: <BAY20-F427CC9EB24DFD33F898CE88D8A0@phx.gbl>
Message-ID: <20050113141026.GA19979@redhat.com>

On Thu, Jan 13, 2005 at 12:28:15PM +0100, maria perez wrote:
> 
> 
>   >Michael Conrad Tadpol Tilstra
> 
>   >run dmesg to get more info about why it cannot mount.
> 
>  >Did you remember to start lock_gulmd?
> 
> Certainly, thank you very much!
> But now I have another problem.:Only the node stablished like server 
> lock_gulm can mount the file system, the second node hang. Why??
> The nodes' names are different and each node  the file /etc/hosts contains:
> 
>    127.0.0.1    localhost.localdomain      localhost
>    127.0.0.1    machinename.domain    machinename

If your /etc/hosts file is actually setting the ip of nodes in your
cluster to 127.0.0.1, lock_gulmd will not work.


> I have another question: I have read the number of lock_gulm servers only 
> can be 1, 3 or 5, not 2. Is right?

yes.  3,4, and 5 servers run in a mini cluster to avoid a single point of
failure.  1 server runs as a single point of failure, but is useful for
testing.

> I am thinking established the two nodes like servers lock_gulm, will be 
> this correct?

I am sorry, but I don't quite understand this question.

You can setup two nodes, both as servers, (by putting three nodes in the
servers list and not using one of the entries.)  But if one node dies
the other will hang.  It can be done, with gulm it is not advisable.


-- 
Michael Conrad Tadpol Tilstra
I am having an out of money experience. 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050113/9b6ef54c/attachment.sig>

From mtilstra at redhat.com  Thu Jan 13 14:18:24 2005
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Thu, 13 Jan 2005 08:18:24 -0600
Subject: [Linux-cluster] some questions about setting up GFS
In-Reply-To: <2310360088.20050113164023@triumvirat.ru>
References: <1125914338.20050111113424@triumvirat.ru>
	<7B452B98-63F6-11D9-85A8-000A957BB1F6@redhat.com>
	<1814439859.20050112171040@triumvirat.ru>
	<20050112144905.GA3029@redhat.com>
	<2310360088.20050113164023@triumvirat.ru>
Message-ID: <20050113141824.GB19979@redhat.com>

On Thu, Jan 13, 2005 at 04:40:23PM +0300, Sergey wrote:
> 
> >> I have 2 nodes - hp1 and hp2. Any of nodes have Integrated Lights-Out
> >> with ROM Version: 1.55 - 04/16/2004.
> >> 
> 
> > The nodes in the servers config line for gulm form a mini-cluster of
> > sorts.  There must be quorum (51%) of nodes present in this mini-cluster
> > for things to continue.
> 
> > You must have two of the three servers up and running so that the
> > mini-cluster has quorum, which then will alow the other nodes to
> > connect.
> 
> I have only 2 nodes and I can't get quorum. Should I use Single Lock
> Manager (SLM), when one node is master and another is slave?
> 
> But in this case if master goes down slave loses access to common file
> system, and it systemlog looks like this:

Correct. That is the behavor of gulm in SLM mode.


[snip]

> If master boots up after some time happens nothing - slave does not
> try to connect.

Again correct, in SLM mode, the lock state was lost, so there is nothing
for the slave to reconnect to.

For gulm, you need atleast three nodes to get RLM mode.  The third gulm
node does not need to run anything but gulm, and can be configured from
a file using an option to ccsd.  You just need to make sure the configs
are the same on all three nodes.

> What should happens further and in what order?
> 
> 
> > You really should test that fencing works by running 
> > fence_node <node name> for each node in your cluster before running
> > lock_gulmd.  This makes sure that fencing is setup and working
> > correctly.
> 
> > Do that, and once you've verified that fencing is correct (without
> > lock_gulmd running) try things again with lock_gulmd.
> 
> Result of command
> fence_node NODENAME
> is reboot of NODENAME. Is it right?

If you are using a fencing agent that power cycles the node. (so,
sometimes yes.  fence_ilo will reboot the node.)

-- 
Michael Conrad Tadpol Tilstra
IIss  llooccaall  eecchhoo  oonn??
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050113/22ac05a6/attachment.sig>

From tboucher at ca.ibm.com  Thu Jan 13 14:25:53 2005
From: tboucher at ca.ibm.com (Tony Boucher)
Date: Thu, 13 Jan 2005 09:25:53 -0500
Subject: [Linux-cluster] Log entry
In-Reply-To: <20050112103849.3DD8B3031E3@poczta.interia.pl>
Message-ID: <OF9C0EE2D9.563892CF-ON85256F87.004A8BE4-85256F88.004F4411@ca.ibm.com>

There not errors. It looks like you have verbose logging set.


Tony Boucher 
I/T Specialist (HACMP/GPFS/WLM)

"Experience is a hard teacher because she gives the test first, the lesson 
afterwards." -- Unknown


ptr at poczta.fm 
Sent by: linux-cluster-bounces at redhat.com
01/12/2005 05:38 AM
Please respond to
linux clistering


To
linux-cluster at redhat.com
cc

Subject
[Linux-cluster] Log entry


   Hello.
I'm receiving entries like the one below in my system logs.
It's 2-nodes cluster  built form CVS.

-node1:

dlm: lkb
id 52cd01b3
remid 4c730361
flags 0
status 3
rqmode 5
grmode 3
nodeid 1
lqstate 2
lqflags 44
dlm: request
rh_cmd 6
rh_lkid 4c730361
remlkid 52cd01b3
flags 0
status 0
rqmode 3
dlm: eva: process_lockqueue_reply id 52cd01b3 state 0

-node2:

dlm: lkb
id 43010219
remid 48330092
flags 0
status 3
rqmode 5
grmode 3
nodeid 2
lqstate 2
lqflags 44
dlm: request
rh_cmd 6
rh_lkid 48330092
remlkid 43010219
flags 0
status 0
rqmode 3
dlm: eva: process_lockqueue_reply id 43010219 state 0

Can someone explain what kind of faults are they?
   Regards,

Piotr

----------------------------------------------------------------------
Najlepsze auto, najlepsze moto... >>> http://link.interia.pl/f1841

--
Linux-cluster mailing list
Linux-cluster at redhat.com
http://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050113/5c013fee/attachment.htm>

From pcaulfie at redhat.com  Thu Jan 13 15:41:20 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 13 Jan 2005 15:41:20 +0000
Subject: [Linux-cluster] Simple wrap for SAF AIS Lock API
In-Reply-To: <1102610496.4843.16.camel@manticore.sh.intel.com>
References: <1102610496.4843.16.camel@manticore.sh.intel.com>
Message-ID: <20050113154120.GE2346@tykepenguin.com>

On Thu, Dec 09, 2004 at 04:41:36PM +0000, Stanley Wang wrote:
> Hi all,
> 
> The attached patch provides SAF AIS lock APIs support based on current
> GDLM. It's just simplest wrap of GDLM's user mode api and didn't touch
> GDLM's codes. I think it can be a good complementarity to GDLM.
> 
> The patch is against lastest CVS codes.
> 
> Any interests or comments?

Now committed to CVS head. Sorry for the rather long delay.
-- 

patrick


From mshk_00 at hotmail.com  Fri Jan 14 09:22:37 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Fri, 14 Jan 2005 10:22:37 +0100
Subject: : Re: Re Re: [Linux-cluster] mount file system GFS
Message-ID: <BAY20-F413FD384B77A724C1A0A008D8B0@phx.gbl>

Thank you very much for your help, Michael.
Excuse me, but my english is not enough good. I try write correctly in an 
understable way,but not always I achieve it.

Finally, with your help, I achieved mount a file system shared by two nodes, 
only one of this is running like lock_gulmd server. I had to eliminate the 
lines in /etc/hosts file of each node, that contain 127.0.0.1. Someone said 
to me in a occasion that never eliminate the line:
           '127.0.0.1  localhost.localdomain    localhost'
What can it happen?? what problems can appear??

My system now had a single point of failure, I would like, if it is possible 
that the two nodes were servers lock_gulm . I understand in your message I 
can run the two nodes like servers lock_gulm having only two nodes but 
declaring three nodes in the file cluster.ccs in the sentence  servers=" " 
and using only two of the three (really the third node not exits). In the 
file nodes.ccs : I had to declare the three nodes too??Nor?? Do I undersand 
you well??

_________________________________________________________________
Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. 
Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil


From Axel.Thimm at ATrpms.net  Fri Jan 14 10:57:24 2005
From: Axel.Thimm at ATrpms.net (Axel Thimm)
Date: Fri, 14 Jan 2005 11:57:24 +0100
Subject: [Linux-cluster] Re: CVS compile with 2.6.10-rc3
In-Reply-To: <20041210221412.GA26453@potassium.msp.redhat.com>
References: <20041210215924.GA11520@iwork57.lis.uiuc.edu>
	<20041210221412.GA26453@potassium.msp.redhat.com>
Message-ID: <20050114105724.GB5419@neu.nirvana>

The fix is in CVS now, thanks!

Now that FC2/FC3 have gone 2.6.10, the rawhide GFS-kernel packages
break (they also break against rawhide's 2.6.10). Could a new gfs CVS
checkout be committed into rawhide?

I'm preparing packages for FC3 (perhaps even FC2) and want to be
source-wise as close to rawhide/rhel4 as possible.

Thanks!

On Fri, Dec 10, 2004 at 04:14:12PM -0600, Ken Preslan wrote:
> It looks like every other driver in the rc3 patch just drops the "0"
> argument to that function.  Go ahead and try it and see what you get.
> 
> 
> On Fri, Dec 10, 2004 at 03:59:25PM -0600, Brynnen R Owen wrote:
> > Hi all,
> > 
> >   This may be off your radar still, but it appears that the CVS source
> > fails to compile with vanilla 2.6.10-rc3.  The smoking source file is
> > cluster/gfs-kernel/src/gfs/quota.c:
> > 
> > CC [M]  /mnt/install/src-2.6.10-rc3-gfs32-1/cluster/gfs-kernel/src/gfs/quota.o
> > /mnt/install/src-2.6.10-rc3-gfs32-1/cluster/gfs-kernel/src/gfs/quota.c:
> >   In function `print_quota_message':
> > /mnt/install/src-2.6.10-rc3-gfs32-1/cluster/gfs-kernel/src/gfs/quota.c:956:
> >   warning: passing arg 3 of pointer to function makes integer from
> >   pointer without a cast
> > /mnt/install/src-2.6.10-rc3-gfs32-1/cluster/gfs-kernel/src/gfs/quota.c:956:
> >   too many arguments to function
> > 
> > Did the kernel API for tty access change?
-- 
Axel.Thimm at ATrpms.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050114/1b1d5d31/attachment.sig>

From cjkovacs at verizon.net  Fri Jan 14 11:29:38 2005
From: cjkovacs at verizon.net (Corey Kovacs)
Date: Fri, 14 Jan 2005 06:29:38 -0500
Subject: : Re: Re Re: [Linux-cluster] mount file system GFS
In-Reply-To: <BAY20-F413FD384B77A724C1A0A008D8B0@phx.gbl>
References: <BAY20-F413FD384B77A724C1A0A008D8B0@phx.gbl>
Message-ID: <200501140629.39107.cjkovacs@verizon.net>

You don't have to remove the loopback line, only the reference to the machines host name
in that line...


so instead of having....

127.0.0.1        mymachinename localhost.localdomain   localhost

You only need/want....

127.0.0.1        localhost.localdomain   localhost
192.168.1.1 mymachinename.mydomain.com mymachinename

of course you'll use your correct ip address, etc....

having the host name in the loopback line causes all sorts of problems with
other things as well and I am not sure why it's put there in the first place.


Cheers.


Corey


On Friday 14 January 2005 04:22, maria perez wrote:
> Thank you very much for your help, Michael.
> Excuse me, but my english is not enough good. I try write correctly in an 
> understable way,but not always I achieve it.
> 
> Finally, with your help, I achieved mount a file system shared by two nodes, 
> only one of this is running like lock_gulmd server. I had to eliminate the 
> lines in /etc/hosts file of each node, that contain 127.0.0.1. Someone said 
> to me in a occasion that never eliminate the line:
>            '127.0.0.1  localhost.localdomain    localhost'
> What can it happen?? what problems can appear??
> 
> My system now had a single point of failure, I would like, if it is possible 
> that the two nodes were servers lock_gulm . I understand in your message I 
> can run the two nodes like servers lock_gulm having only two nodes but 
> declaring three nodes in the file cluster.ccs in the sentence  servers=" " 
> and using only two of the three (really the third node not exits). In the 
> file nodes.ccs : I had to declare the three nodes too??Nor?? Do I undersand 
> you well??
> 
> _________________________________________________________________
> Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. 
> Desc?rgalo y pru?balo 2 meses gratis. 
> http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 


From mshk_00 at hotmail.com  Fri Jan 14 12:31:54 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Fri, 14 Jan 2005 13:31:54 +0100
Subject: : Re: Re Re: [Linux-cluster] mount file system GFS
Message-ID: <BAY20-F3366F678D4F2ED5298FE038D8B0@phx.gbl>

Thanks, It is certain. At last I achieved mount the file system from two 
nodes and the file /etc/hosts maintains 127.0.0.1 
localhost...................

I included the line 127.0.0.1 mymachinename ...... because I understood 
wrong a message that I read
and I decided to probe with it.

Now I know the problem was in the file /etc/hosts of each machine does not 
appear a line including the ip and name of the other machine.

Regards

>You don't have to remove the loopback line, only the reference to the 
>machines host name
>in that line...
>so instead of having....
>
>127.0.0.1        mymachinename localhost.localdomain   localhost
>
>You only need/want....

>127.0.0.1        localhost.localdomain   localhost
>192.168.1.1 mymachinename.mydomain.com mymachinename

>of course you'll use your correct ip address, etc....
>having the host name in the loopback line causes all sorts of problems with
>other things as well and I am not sure why it's put there in the first 
>place.

>Cheers.

>Corey

_________________________________________________________________
Descarga gratis la Barra de Herramientas de MSN 
http://www.msn.es/usuario/busqueda/barra?XAPID=2031&DI=1055&SU=http%3A//www.hotmail.com&HL=LINKTAG1OPENINGTEXT_MSNBH


From mshk_00 at hotmail.com  Fri Jan 14 12:47:00 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Fri, 14 Jan 2005 13:47:00 +0100
Subject: [Linux-cluster] gfs probelm
Message-ID: <BAY20-F3755008E2C02111EBE9E868D8B0@phx.gbl>


>HI,
>
 > I configure the gfs , and i mount the partitioned as gfs as mentioned in
>the document, but when make a reboot, the system halted and stay ask
>continuously :  lock_glumd is it running.

 >i didnt put the mounted gfs partitions in the /etc/fstab.  ( is that true
>?)

>i made a shell and i put in it the following :
>  service ccsd stop
>  service lock_gulmd stop

>  and i execut it before i make a reboot, and when i loged again to the
>system the two services are running by the system, that is ok , i know it 
>is
>not a solution , BUT in the second reboot i found that the system gives the
>same continuous error question ( lock_gulmd is it running? ).

>how can i solve this ?
>can i put the partitions in the /etc/fstab ?

>OR WHAT ?????.

I don't know many nodes you have running, and if all nodes are lock_gulm 
servers or only one of them. Maybe if you reboot a node that is a server 
lock_gulm without stop this services (gfs, lock_gulmd, ccsd) and other nodes 
that are running depens of that node.

have you created the archive /etc/sysconfig/gfs ???

!!The order to stop the modules is: gfs, lock_gulmd, ccsd, pool.
Sorry I can not help you more.
   Good luck!

_________________________________________________________________
Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. 
Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil


From mtilstra at redhat.com  Fri Jan 14 15:43:41 2005
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Fri, 14 Jan 2005 09:43:41 -0600
Subject: : Re: Re Re: [Linux-cluster] mount file system GFS
In-Reply-To: <BAY20-F413FD384B77A724C1A0A008D8B0@phx.gbl>
References: <BAY20-F413FD384B77A724C1A0A008D8B0@phx.gbl>
Message-ID: <20050114154341.GC24596@redhat.com>

On Fri, Jan 14, 2005 at 10:22:37AM +0100, maria perez wrote:
> Thank you very much for your help, Michael.
> Excuse me, but my english is not enough good. I try write correctly in an 
> understable way,but not always I achieve it.

Yeah, no worries. I've been speaking english all my life, and I still
screw it up regularly.  ^_^

[snipped what got answered by others]

> My system now had a single point of failure, I would like, if it is 
> possible that the two nodes were servers lock_gulm . I understand in your 
> message I can run the two nodes like servers lock_gulm having only two 
> nodes but declaring three nodes in the file cluster.ccs in the sentence  
> servers=" " and using only two of the three (really the third node not 
> exits). In the file nodes.ccs : I had to declare the three nodes too??Nor?? 
> Do I undersand you well??

Yes, that right.  With this setup, one node will stop when the other
dies.  But you will not need to reboot both, just the one that died.
Not an ideal situation, but a little better.

All this comes from the fact that gulm was not designed with small
in mind.

-- 
Michael Conrad Tadpol Tilstra
Chemicals, n.: Noxious substances from which modern foods are made.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050114/91d4edf0/attachment.sig>

From tom at nethinks.com  Fri Jan 14 10:41:34 2005
From: tom at nethinks.com (tom at nethinks.com)
Date: Fri, 14 Jan 2005 11:41:34 +0100
Subject: [Linux-cluster] mail-cluster + gfs setup?
Message-ID: <OF6FC34889.5A5E1EF6-ONC1256F89.00389480-C1256F89.003ABD64@nethinks.com>


Hi all,

does somebody here on the list have successfully setup a cyrus pop3/imap
mail cluster?

We are running for 2 month a test setup and we are disappointed of the
performance.

We already updatet gfs to the last cvs version but the performance of
reading the berkley db are more then poor
it takes more then 5 min to initalize the db with two servers connectet to
the gfs filesystem.

Many thx.

-tom


From bujan at isqsolutions.com  Fri Jan 14 17:58:07 2005
From: bujan at isqsolutions.com (Manuel Bujan)
Date: Fri, 14 Jan 2005 12:58:07 -0500
Subject: [Linux-cluster] Which APC fence device ?
Message-ID: <005c01c4fa62$9b438910$7801a8c0@pcbujan>

Hi,

Could any of you guys can recommend me a working APC Masterswitch model to 
use as a fencing device for our two node GFS cluster ?

We are planning to go in production by the next month and we were using 
until now the fencing manual mechanism.

I looked inside the APC site and I found different models, but I am not 
certainly sure which one to select to be compatible with the fence_apc 
program.

Does fence_apc work with ethernet power switches from APC like the model 
AP7900 ?
http://www.apc.com/products/family/index.cfm?id=70

Any suggestions

Regards
Bujan


From bujan at isqsolutions.com  Fri Jan 14 18:04:20 2005
From: bujan at isqsolutions.com (Manuel Bujan)
Date: Fri, 14 Jan 2005 13:04:20 -0500
Subject: [Linux-cluster] mail-cluster + gfs setup?
References: <OF6FC34889.5A5E1EF6-ONC1256F89.00389480-C1256F89.003ABD64@nethinks.com>
Message-ID: <005f01c4fa63$7ad0ff40$7801a8c0@pcbujan>

yes,

We are now testing a two-node installation using Postfix + Cyrus Imap/Pop3 + 
MySQL + Apache without major problems.

I recomend you to use a MailDir style mailbox and disable the sorting and 
threading features that were enabled in the Cyrus IMAP installation by 
default.

Regards
Bujan

----- Original Message ----- 
From: <tom at nethinks.com>
To: <linux-cluster at redhat.com>
Sent: Friday, January 14, 2005 5:41 AM
Subject: [Linux-cluster] mail-cluster + gfs setup?


>
>
>
>
> Hi all,
>
> does somebody here on the list have successfully setup a cyrus pop3/imap
> mail cluster?
>
> We are running for 2 month a test setup and we are disappointed of the
> performance.
>
> We already updatet gfs to the last cvs version but the performance of
> reading the berkley db are more then poor
> it takes more then 5 min to initalize the db with two servers connectet to
> the gfs filesystem.
>
> Many thx.
>
> -tom
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 


From lhh at redhat.com  Fri Jan 14 21:26:50 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 14 Jan 2005 16:26:50 -0500
Subject: [Linux-cluster] Which APC fence device ?
In-Reply-To: <005c01c4fa62$9b438910$7801a8c0@pcbujan>
References: <005c01c4fa62$9b438910$7801a8c0@pcbujan>
Message-ID: <1105738010.9279.203.camel@ayanami.boston.redhat.com>

On Fri, 2005-01-14 at 12:58 -0500, Manuel Bujan wrote:

> Does fence_apc work with ethernet power switches from APC like the model 
> AP7900 ?
> http://www.apc.com/products/family/index.cfm?id=70

I think it works with the 7900 and 7921; not 100% on that.

-- Lon


From yazan at ccs.com.jo  Sat Jan 15 06:15:25 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Sat, 15 Jan 2005 08:15:25 +0200
Subject: [Linux-cluster] gfs probelm
References: <BAY20-F3755008E2C02111EBE9E868D8B0@phx.gbl>
Message-ID: <00bc01c4fac9$9a225bd0$69050364@yazanz>


i have two nodes , orat1 , orat2 .
   and i make the lock_gulmd on the tow nodes because the document i have
says that i should run the the ccsd then the lock_gulmd on the two nodes.

and i put the first node as lock_gulm in the cluster.ccs file.

Thanks.
----- Original Message ----- 
From: "maria perez" <mshk_00 at hotmail.com>
To: <linux-cluster at redhat.com>
Sent: Friday, January 14, 2005 2:47 PM
Subject: Re:[Linux-cluster] gfs probelm


>
>
>
> >HI,
> >
>  > I configure the gfs , and i mount the partitioned as gfs as mentioned
in
> >the document, but when make a reboot, the system halted and stay ask
> >continuously :  lock_glumd is it running.
>
>  >i didnt put the mounted gfs partitions in the /etc/fstab.  ( is that
true
> >?)
>
> >i made a shell and i put in it the following :
> >  service ccsd stop
> >  service lock_gulmd stop
>
> >  and i execut it before i make a reboot, and when i loged again to the
> >system the two services are running by the system, that is ok , i know it
> >is
> >not a solution , BUT in the second reboot i found that the system gives
the
> >same continuous error question ( lock_gulmd is it running? ).
>
> >how can i solve this ?
> >can i put the partitions in the /etc/fstab ?
>
> >OR WHAT ?????.
>
> I don't know many nodes you have running, and if all nodes are lock_gulm
> servers or only one of them. Maybe if you reboot a node that is a server
> lock_gulm without stop this services (gfs, lock_gulmd, ccsd) and other
nodes
> that are running depens of that node.
>
> have you created the archive /etc/sysconfig/gfs ???
>
> !!The order to stop the modules is: gfs, lock_gulmd, ccsd, pool.
> Sorry I can not help you more.
>    Good luck!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Protecci?n para tus hijos en internet.
> Desc?rgalo y pru?balo 2 meses gratis.
>
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster


From yazan at ccs.com.jo  Sat Jan 15 07:13:44 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Sat, 15 Jan 2005 09:13:44 +0200
Subject: [Linux-cluster] gfs probelm
References: <BAY20-F3755008E2C02111EBE9E868D8B0@phx.gbl>
Message-ID: <00c101c4fad1$bff3c7b0$69050364@yazanz>

no i didnt create the archive /etc/sysconfig/gfs ??? yet.

 actually i dont know how will i create it, im using a GFS document and that
is not mention in the doc.

how can i build it and what can i put in it ?

 Thanks.


----- Original Message ----- 
From: "maria perez" <mshk_00 at hotmail.com>
To: <linux-cluster at redhat.com>
Sent: Friday, January 14, 2005 2:47 PM
Subject: Re:[Linux-cluster] gfs probelm


>
>
>
> >HI,
> >
>  > I configure the gfs , and i mount the partitioned as gfs as mentioned
in
> >the document, but when make a reboot, the system halted and stay ask
> >continuously :  lock_glumd is it running.
>
>  >i didnt put the mounted gfs partitions in the /etc/fstab.  ( is that
true
> >?)
>
> >i made a shell and i put in it the following :
> >  service ccsd stop
> >  service lock_gulmd stop
>
> >  and i execut it before i make a reboot, and when i loged again to the
> >system the two services are running by the system, that is ok , i know it
> >is
> >not a solution , BUT in the second reboot i found that the system gives
the
> >same continuous error question ( lock_gulmd is it running? ).
>
> >how can i solve this ?
> >can i put the partitions in the /etc/fstab ?
>
> >OR WHAT ?????.
>
> I don't know many nodes you have running, and if all nodes are lock_gulm
> servers or only one of them. Maybe if you reboot a node that is a server
> lock_gulm without stop this services (gfs, lock_gulmd, ccsd) and other
nodes
> that are running depens of that node.
>
> have you created the archive /etc/sysconfig/gfs ???
>
> !!The order to stop the modules is: gfs, lock_gulmd, ccsd, pool.
> Sorry I can not help you more.
>    Good luck!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Protecci?n para tus hijos en internet.
> Desc?rgalo y pru?balo 2 meses gratis.
>
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster


From yazan at ccs.com.jo  Sat Jan 15 07:27:42 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Sat, 15 Jan 2005 09:27:42 +0200
Subject: [Linux-cluster] cluster suite question?
Message-ID: <01db01c4fad3$b366d1c0$69050364@yazanz>

hi,

should i install the cluster suite on the two nodes or only on one node??

  my problem is that when i installed the cluster suit on the two nodes then
i found that the gui of the cluster suit didnt have the quorum check box
checked even i configure the /etc/sysconfig/rawdevices , and i put in it two
partition each of 100MB as raw1 and raw2 and i used them but with the same
problem .

 firstly i configured the GFS and then i installed ther cluster suit an im
haveing this problem , and because of that i asked if can i installed it on
the two nodes or not ?????????.


Thanks.


From mshk_00 at hotmail.com  Mon Jan 17 08:48:04 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Mon, 17 Jan 2005 09:48:04 +0100
Subject: [Linux-cluster] gfs probelm
Message-ID: <BAY20-F93153BF1CCC72A8CE9ABD8D8E0@phx.gbl>

>no i didnt create the archive /etc/sysconfig/gfs ??? yet.

 >actually i dont know how will i create it, im using a GFS document and 
that
>is not mention in the doc.

>how can i build it and what can i put in it ?

>Thanks.

The only you have to do is: since a console of the system write:

[root at machinename]# nano /etc/sysconfig/gfs
then appear in the console this archive and write into it something like 
this;

   POOLS="name_pool1 name_pool2 ...name_pooln"
   CCS_ARCHIVE="/dev/pool/name_pool_cluster"

(name_pool_cluster: pool created for the archive cca of the cluster that you 
are usign, in the administration guide of GFS6.0 appear like alpha_cca )

(You can create the file in other way of course)

With this archive the system on the boot or reboot try to stop ther 
services: gfs, lock_gulmd, ccsd and pool (I believe it!!!!)

maria

_________________________________________________________________
Hor?scopo, tarot, numerolog?a... Escucha lo que te dicen los astros. 
http://astrocentro.msn.es/


From yazan at ccs.com.jo  Mon Jan 17 09:20:17 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Mon, 17 Jan 2005 11:20:17 +0200
Subject: [Linux-cluster] lvm with gfs
Message-ID: <000b01c4fc75$c25378c0$69050364@yazanz>

hi,

The shared that i have is raid5, and i have completedthe partitions i want
and i formated them as gfs and mount them and everything is OK.

 but now , i want to setup rawdevices, and i want to put each rawdevice in
one partion , so i have taken the free space from the shared and made it an
lvm partition, and then i partioned the new lvm partition into 19 partitions
:
 /dev/vg0/r1........ /dev/vg0/r18
and i want to format them as gfs filesystem

i used the same procedure used from the beggining
so i made another partition named as /dev/vg0/newgfs (to use as CCS file)

but when i make : pool_tool -c to any of the new partitions it says that (
Unable to open device
"/dev/lvm(a-s)"  then it writes pool label written successfully from the
file.

when i run  pool_tool -s
it writes an <ERROR> with the new partitions.

This is the problem.

?????

Thanks


From yazan at ccs.com.jo  Mon Jan 17 09:57:15 2005
From: yazan at ccs.com.jo (Yazan Al-Sheyyab)
Date: Mon, 17 Jan 2005 11:57:15 +0200
Subject: [Linux-cluster] pool
Message-ID: <000701c4fc7a$ec88bb50$69050364@yazanz>

hi,

 can i put partitions created fom LVM as
  /dev/vg0/r1...../dev/vg0/r18

    into pools ?  and how?
 is it the same procedure ?

how to put them in pool so that when i run
 pool_tool -s      it doesnot give <error> to them.

Thanks


From daniel at osdl.org  Tue Jan 18 01:31:33 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Mon, 17 Jan 2005 17:31:33 -0800
Subject: [Linux-cluster] cluster failed after 53 hours
Message-ID: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>

My 3 node cluster ran tests for 53 hours before hitting a problem.


Node cl031 hit the 1st problem CMAN: killed by STARTTRANS or
NOMINATE.  There is a DLM assert on cl031 also, but that is
after a whole bunch of debug output.  The full logs are
here (http://developer.osdl.org/daniel/GFS/test.12jan2005/)

Any ideas on what is going on?

Here is simplified output (in the README file):
test started Jan Wed 12 17:18
hung after Fri Jan 14 22:00

cl031 got an error in just under 53 hours.
==========================================
Jan 14 22:00:38 cl031 kernel: CMAN: node cl031a has been removed from the cluster : No response to messages
Jan 14 22:00:38 cl031 kernel: CMAN: killed by STARTTRANS or NOMINATE
Jan 14 22:00:38 cl031 kernel: CMAN: we are leaving the cluster.
Jan 14 22:00:38 cl031 kernel: name "       2          54aef1" flags 2 nodeid 0 ref 1
Jan 14 22:00:38 cl031 kernel: G 0029017f gr 5 rq -1 flg 0 sts 2 node 0 remid 0 lq 0,5
[snip 34980 lines]
Jan 14 22:10:07 cl031 kernel: G 00010165 gr 5 rq -1 flg 0 sts 2 node 0 remid 0 lq 0,5
Jan 14 22:10:07 cl031 kernel:  3 to 3 id 432
Jan 14 22:10:07 cl031 kernel: stripefs updated 350 resources
Jan 14 22:10:07 cl031 kernel: stripefs rebuild locks
Jan 14 22:10:07 cl031 kernel: stripefs rebuilt 0 locks
Jan 14 22:10:07 cl031 kernel: stripefs recover event 6122 done
Jan 14 22:10:07 cl031 kernel: stripefs rcom status f to 3
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 433
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 434
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 435
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 436
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 437
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 438
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 439
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 440
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 441
Jan 14 22:10:07 cl031 kernel: stripefs rcom send 6 to 3 id 442
Jan 14 22:10:07 cl031 kernel: stripefs move flags 0,0,1 ids 6119,6122,6122
Jan 14 22:10:07 cl031 kernel: stripefs process held requests
Jan 14 22:10:07 cl031 kernel: stripefs processed 0 requests
Jan 14 22:10:07 cl031 kernel: stripefs resend marked requests
Jan 14 22:10:07 cl031 kernel: stripefs resent 0 requests
Jan 14 22:10:07 cl031 kernel: stripefs recover event 6122 finished
Jan 14 22:10:07 cl031 kernel: stripefs move flags 1,0,0 ids 6122,6122,6122
Jan 14 22:10:07 cl031 kernel: stripefs add_to_requestq cmd 1 fr 3
Jan 14 22:10:08 cl031 kernel: stripefs move flags 0,0,0 ids 6122,6122,6122
Jan 14 22:10:08 cl031 kernel: stripefs rcom status 0 to 1
Jan 14 22:10:08 cl031 kernel: stripefs move flags 0,1,0 ids 6122,6123,6122
Jan 14 22:10:08 cl031 kernel: stripefs move use event 6123
Jan 14 22:10:08 cl031 kernel: stripefs recover event 6123
Jan 14 22:10:08 cl031 kernel: stripefs add node 1
Jan 14 22:10:08 cl031 kernel: stripefs rcom send 1 to 1 id 443
Jan 14 22:10:08 cl031 kernel: stripefs rcom status 4 to 1
Jan 14 22:10:08 cl031 kernel:
jan 14 22:10:08 cl031 kernel: DLM:  Assertion failed on line 128 of file /Views/redhat-cluster/cluster/dlm-kernel/src/reccomms.c
Jan 14 22:10:08 cl031 kernel: DLM:  assertion:  "error >= 0"
Jan 14 22:10:08 cl031 kernel: DLM:  time = 201619244
Jan 14 22:10:08 cl031 kernel: error = -105
Jan 14 22:10:08 cl031 kernel:


>From reccoms.c:
        error = midcomms_send_message(nodeid, (struct dlm_header *) rc,
                                      GFP_KERNEL);
        DLM_ASSERT(error >= 0, printk("error = %d\n", error););


cl030
=====
Jan 14 22:00:38 cl030 kernel: CMAN: removing node cl031a from the cluster : No rresponse to messages
Jan 14 22:00:39 cl030 kernel: dlm: stripefs: nodes_init failed -1
Jan 14 22:00:39 cl030 fence_manual: Node cl031a needs to be reset before
recoverry can procede.  Waiting for cl031a to rejoin the cluster or for
manual acknowleddgement that it has been reset (i.e. fence_ack_manual -s cl031a)
(2 hours and 45 minutes later  Sat Jan 15 00:45:00)
Jan 15 00:50:12 cl030 kernel: CMAN: nmembers in HELLO message from 3 does not maatch our view (got 1, exp 2)
Jan 15 00:52:57 cl030 kernel: CMAN: too many transition restarts - will die
Jan 15 00:52:57 cl030 kernel: CMAN: we are leaving the cluster. Inconsistent cluuster view

cl032 
=====
Jan 14 22:00:38 cl032 kernel: CMAN: node cl031a has been removed from the cluster : No response to messages
Jan 14 22:00:39 cl032 kernel: dlm: stripefs: nodes_reconfig failed 1
Jan 14 22:00:39 cl032 fenced[8983]: fencing deferred to 1
Jan 15 00:50:08 cl032 kernel: CMAN: removing node cl030a from the cluster : No response to messages
Jan 15 00:50:08 cl032 kernel: CMAN: quorum lost, blocking activity
Jan 15 00:53:02 cl032 kernel: SM: 00000001 process_recovery_barrier status=-104


Daniel


From Axel.Thimm at ATrpms.net  Tue Jan 18 02:21:05 2005
From: Axel.Thimm at ATrpms.net (Axel Thimm)
Date: Tue, 18 Jan 2005 03:21:05 +0100
Subject: [Linux-cluster] FC3 and FC2 package backports of GFS
Message-ID: <20050118022105.GU5849@neu.nirvana>

Hi,

I'm starting to push out packages for GFS on FC3 and FC2 under

	     http://atrpms.net/name/cluster/

The userland packages are basically rebuilds of what exists in FC
rawhide.

The kernel module packages are still the same cut from CVS like the
rawhide packages (with minor compile fixes for 2.6.10), but the
packages have been completely restructured to allow for each installed
kernel to have its own non-conflicting copy of the required kernel
modules. I.e. the kernel modules for GFS are in packages called
GFS-kmdl-<kernel uname -r> etc.

There are also packages for qla2xxx, device-mapper with multipath
support as well as multipath-tools for setting up GFS over FC.

Please note that most packages are placed in the "bleeding" repo which
is only for early and experimental packages. Nevertheless feel free to
fry your SANs with them. :)

Thanks!
-- 
Axel.Thimm at ATrpms.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050118/d6556bcc/attachment.sig>

From pcaulfie at redhat.com  Tue Jan 18 08:48:30 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 18 Jan 2005 08:48:30 +0000
Subject: [Linux-cluster] cluster failed after 53 hours
In-Reply-To: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>
References: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050118084830.GC12101@tykepenguin.com>

On Mon, Jan 17, 2005 at 05:31:33PM -0800, Daniel McNeil wrote:
> My 3 node cluster ran tests for 53 hours before hitting a problem.
> 
> 
> Node cl031 hit the 1st problem CMAN: killed by STARTTRANS or
> NOMINATE.  There is a DLM assert on cl031 also, but that is
> after a whole bunch of debug output.  The full logs are
> here (http://developer.osdl.org/daniel/GFS/test.12jan2005/)
> 
> Any ideas on what is going on?
> 
> Here is simplified output (in the README file):
> test started Jan Wed 12 17:18
> hung after Fri Jan 14 22:00
> 
> cl031 got an error in just under 53 hours.
> ==========================================
> Jan 14 22:00:38 cl031 kernel: CMAN: node cl031a has been removed from the cluster : No response to messages

It's the usual thing. missing messages.

patrick


From pcaulfie at redhat.com  Tue Jan 18 14:01:58 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 18 Jan 2005 14:01:58 +0000
Subject: [Linux-cluster] cluster failed after 53 hours
In-Reply-To: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>
References: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050118140158.GH12101@tykepenguin.com>

On Mon, Jan 17, 2005 at 05:31:33PM -0800, Daniel McNeil wrote:
> My 3 node cluster ran tests for 53 hours before hitting a problem.

Attached is a patch to set the CMAN process to run at realtime priority, I'm not
sure if that's the right thing to do or not to be honest.

Neither am I sure whether your 48-53 hours is significant - it's possible that
memory may be an issue (only guessing but GFS caches locks like crazy, it may be
worth cutting this down a bit by tweaking

/proc/cluster/lock_dlm/drop_count    and/or
/proc/cluster/lock_dlm/drop_period

otherwise, the only way were gpoing to get to the bottom of this is to enable
"DEBUG_MEMB" in cman and see what it thinks is going on when the node is kicked
out of the cluster.


patrick
-------------- next part --------------
Index: cnxman.c
===================================================================
RCS file: /cvs/cluster/cluster/cman-kernel/src/cnxman.c,v
retrieving revision 1.45
diff -u -p -r1.45 cnxman.c
--- cnxman.c	17 Jan 2005 14:42:36 -0000	1.45
+++ cnxman.c	18 Jan 2005 10:49:50 -0000
@@ -63,6 +63,7 @@ static int is_valid_temp_nodeid(int node
 extern int start_membership_services(pid_t);
 extern int kcl_leave_cluster(int remove);
 extern int send_kill(int nodeid, int needack);
+extern void cman_set_realtime(struct task_struct *tsk, int prio);
 
 static struct proto_ops cl_proto_ops;
 static struct sock *master_sock;
@@ -308,7 +309,7 @@ static int cluster_kthread(void *unused)
 	init_waitqueue_entry(&cnxman_waitq_head, current);
 	add_wait_queue(&cnxman_waitq, &cnxman_waitq_head);
 
-	set_user_nice(current, -6);
+	cman_set_realtime(current, 1);
 
 	/* Allow the sockets to start receiving */
 	list_for_each(socklist, &socket_list) {
Index: membership.c
===================================================================
RCS file: /cvs/cluster/cluster/cman-kernel/src/membership.c,v
retrieving revision 1.47
diff -u -p -r1.47 membership.c
--- membership.c	13 Jan 2005 14:12:59 -0000	1.47
+++ membership.c	18 Jan 2005 10:49:50 -0000
@@ -201,6 +202,13 @@ static uint8_t *node_opinion = NULL;
 #define OPINION_AGREE    1
 #define OPINION_DISAGREE 2
 
+
+void cman_set_realtime(struct task_struct *tsk, int prio)
+{
+        tsk->policy = SCHED_FIFO;
+        tsk->rt_priority = prio;
+}
+
 /* Set node id of a node, also add it to the members array and expand the array
  * if necessary */
 static inline void set_nodeid(struct cluster_node *node, int nodeid)
@@ -281,7 +289,7 @@ static int hello_kthread(void *unused)
 	hello_task = tsk;
 	up(&hello_task_lock);
 
-	set_user_nice(current, -20);
+	cman_set_realtime(current, 1);
 
 	while (node_state != REJECTED && node_state != LEFT_CLUSTER) {
 
@@ -317,7 +325,7 @@ static int membership_kthread(void *unus
 	sigprocmask(SIG_BLOCK, &tmpsig, NULL);
 
 	membership_task = tsk;
-	set_user_nice(current, -5);
+	cman_set_realtime(current, 1);
 
 	/* Open the socket */
 	if (init_membership_services())

From chekov at ucla.edu  Tue Jan 18 22:04:13 2005
From: chekov at ucla.edu (Alan Wood)
Date: Tue, 18 Jan 2005 14:04:13 -0800 (PST)
Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 9, Issue 12
In-Reply-To: <20050115170059.3B0E67387F@hormel.redhat.com>
References: <20050115170059.3B0E67387F@hormel.redhat.com>
Message-ID: <Pine.LNX.4.61.0501181401170.13132@c-24-130-248-54.we.client2.attbi.com>

Bujan,

we purchased the AP7900 units and they work fine with the fence_apc module. 
the only issue I had was that fence_apc did not support SSH, though the 
AP7900 does, which isn't an issue if you put your PDUs on their own 
secluded network.  the telnet interface works perfectly.
-alan

On Sat, 15 Jan 2005 linux-cluster-request at redhat.com wrote:

> Date: Fri, 14 Jan 2005 12:58:07 -0500
> From: "Manuel Bujan" <bujan at isqsolutions.com>
> Subject: [Linux-cluster] Which APC fence device ?
> To: "linux clustering" <linux-cluster at redhat.com>
> Message-ID: <005c01c4fa62$9b438910$7801a8c0 at pcbujan>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
>
> Hi,
>
> Could any of you guys can recommend me a working APC Masterswitch model to
> use as a fencing device for our two node GFS cluster ?
>
> We are planning to go in production by the next month and we were using
> until now the fencing manual mechanism.
>
> I looked inside the APC site and I found different models, but I am not
> certainly sure which one to select to be compatible with the fence_apc
> program.
>
> Does fence_apc work with ethernet power switches from APC like the model
> AP7900 ?
> http://www.apc.com/products/family/index.cfm?id=70
>
> Any suggestions
>
> Regards
> Bujan
>
>>
>>


From amanthei at redhat.com  Tue Jan 18 22:36:58 2005
From: amanthei at redhat.com (Adam Manthei)
Date: Tue, 18 Jan 2005 16:36:58 -0600
Subject: [Linux-cluster] gfs probelm
In-Reply-To: <001501c4f971$f294df80$69050364@yazanz>
References: <001501c4f971$f294df80$69050364@yazanz>
Message-ID: <20050118223658.GP1885@redhat.com>

Hi...

On Thu, Jan 13, 2005 at 03:15:26PM +0200, Yazan Al-Sheyyab wrote:
> 
> 
> HI,
> 
>   I configure the gfs , and i mount the partitioned as gfs as mentioned in
> the document, but when make a reboot, the system halted and stay ask
> continuously :  lock_glumd is it running.
> 
>  i didnt put the mounted gfs partitions in the /etc/fstab.  ( is that true
> ?)
> 
> i made a shell and i put in it the following :
>   service ccsd stop
>   service lock_gulmd stop

Why make a shell script if the initscripts are installed on the system?
The easiest way to get GFS start on boot is to make sure that all 4
subsystems for GFS are started.  They also need to be started in the correct
order:

1. service pool start
2. service ccsd start
3. service lock_gulmd start
4. service gfs start

To enable them automatically on the system, use chkconfig to turn them on:

chkconfig pool --add
chkconfig ccsd --add
chkconfig lock_gulmd --add
chkconfig gfs --add

>   and i execut it before i make a reboot, and when i loged again to the
> system the two services are running by the system, that is ok , i know it is
> not a solution , BUT in the second reboot i found that the system gives the
> same continuous error question ( lock_gulmd is it running? ).

you probably aren't running the lock_gulmd server.

> how can i solve this ?
> can i put the partitions in the /etc/fstab ?

You can put GFS in /etc/fstab provided that lock_gulmd is running.  If you
don't want the system to automatically start them, simply add "noauto" to
the parameters list in /etc/fstab.

You might also run into problems with /etc/rc.d/rc.sysinit and
/etc/rc.d/init.d/netfs trying to mount gfs.  If so, add gfs to the
exclusion list so that is looks like the following:

[root at node root]# grep gfs /etc/rc.d/init.d/netfs 
        action $"Mounting other filesystems: " mount -a -t
nonfs,smbfs,ncpfs,gfs

[root at node root]# grep gfs /etc/rc.d/rc.sysinit 
action $"Mounting local filesystems: " mount -a -t nonfs,smbfs,ncpfs,gfs -O
no_netdev

> OR WHAT ?????.

I notice other on the list commenting on /etc/sysconfig/gfs.  This file is
not typically needed, but can be used to help limit what is autodetected on 
your system on startup.  

POOLS specifies the pools to try to load.  If this parameter is blank, it
      the system will try to load all the pools that it can find

CCS_ARCHIVE specifies the ccs archive to use on the system.  If left blank,
      the system will try to load ccs for an archive it find on a pool.  If 
      it doesn't find one, or finds more than one, it will error out if this
      value is not set.

-- 
Adam Manthei  <amanthei at redhat.com>


From amanthei at redhat.com  Tue Jan 18 22:39:46 2005
From: amanthei at redhat.com (Adam Manthei)
Date: Tue, 18 Jan 2005 16:39:46 -0600
Subject: [Linux-cluster] pool
In-Reply-To: <000701c4fc7a$ec88bb50$69050364@yazanz>
References: <000701c4fc7a$ec88bb50$69050364@yazanz>
Message-ID: <20050118223946.GQ1885@redhat.com>

On Mon, Jan 17, 2005 at 11:57:15AM +0200, Yazan Al-Sheyyab wrote:
> hi,
> 
>  can i put partitions created fom LVM as
>   /dev/vg0/r1...../dev/vg0/r18
> 
>     into pools ?  and how?
>  is it the same procedure ?

It is not recommended that you do this.  However, if the device appears in
/proc/partitions, you can put a pool label on it and assemble it.

> how to put them in pool so that when i run
>  pool_tool -s      it doesnot give <error> to them.
> 
> Thanks
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Adam Manthei  <amanthei at redhat.com>


From daniel at osdl.org  Tue Jan 18 23:10:20 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Tue, 18 Jan 2005 15:10:20 -0800
Subject: [Linux-cluster] cluster failed after 53 hours
In-Reply-To: <20050118084830.GC12101@tykepenguin.com>
References: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>
	<20050118084830.GC12101@tykepenguin.com>
Message-ID: <1106089819.15101.10.camel@ibm-c.pdx.osdl.net>

On Tue, 2005-01-18 at 00:48, Patrick Caulfield wrote:
> On Mon, Jan 17, 2005 at 05:31:33PM -0800, Daniel McNeil wrote:
> > My 3 node cluster ran tests for 53 hours before hitting a problem.
> > 
> > 
> > Node cl031 hit the 1st problem CMAN: killed by STARTTRANS or
> > NOMINATE.  There is a DLM assert on cl031 also, but that is
> > after a whole bunch of debug output.  The full logs are
> > here (http://developer.osdl.org/daniel/GFS/test.12jan2005/)
> > 
> > Any ideas on what is going on?
> > 
> > Here is simplified output (in the README file):
> > test started Jan Wed 12 17:18
> > hung after Fri Jan 14 22:00
> > 
> > cl031 got an error in just under 53 hours.
> > ==========================================
> > Jan 14 22:00:38 cl031 kernel: CMAN: node cl031a has been removed from the cluster : No response to messages
> 
> It's the usual thing. missing messages.
> 
> patrick

There is an DLM ASSERT farther down in log that show error = -105
which is ENOBUFS.  Is this happening after the node has decided
to leave the cluster?  I just want to make sure a out of memory
problem isn't causing the problem.

Daniel


From pcaulfie at redhat.com  Wed Jan 19 08:50:08 2005
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 19 Jan 2005 08:50:08 +0000
Subject: [Linux-cluster] cluster failed after 53 hours
In-Reply-To: <1106089819.15101.10.camel@ibm-c.pdx.osdl.net>
References: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>
	<20050118084830.GC12101@tykepenguin.com>
	<1106089819.15101.10.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050119085008.GD11569@tykepenguin.com>

On Tue, Jan 18, 2005 at 03:10:20PM -0800, Daniel McNeil wrote:
> 
> There is an DLM ASSERT farther down in log that show error = -105
> which is ENOBUFS.  Is this happening after the node has decided
> to leave the cluster?  I just want to make sure a out of memory
> problem isn't causing the problem.
> 

Unfortunately it could be, or it may not be. :( 
lowcomms_get_buffer() can return NULL if either a) there is no memory to
allocate a page, or b) the DLM has been shut down. If that happens, -ENOBUFS is
the result. On balance I would suspect that b) is more likely in this situation.

One oddity in that log is that the DLM took 10 minutes to shutdown after CMAN
decided it had to leave the cluster - or did those 34980 lines have to go down a
serial console? 

-- 

patrick


From tboucher at ca.ibm.com  Wed Jan 19 14:44:17 2005
From: tboucher at ca.ibm.com (Tony Boucher)
Date: Wed, 19 Jan 2005 09:44:17 -0500
Subject: [Linux-cluster] IBM Blade center 
In-Reply-To: <Pine.LNX.4.61.0501181401170.13132@c-24-130-248-54.we.client2.attbi.com>
Message-ID: <OF60A03EBB.4A84B427-ON05256F8D.007F23A5-05256F8E.0050F5B3@ca.ibm.com>

Does anyone know of a fencing module that works with IBM Blade center ? We 
want to be able to  fence by rebooting the blade. 

There are some issues with fencing through the McData switch. The servers 
boot from SAN, so when the McData fence module logs in and disables the FC 
port. The whole node gets hosed. (The OS drive gets fenced off too) 


Thanks,

Tony Boucher 
I/T Specialist (HACMP/GPFS/WLM)
2200 Walkley Rd
Ottawa,ON K1G 5L2

tboucher at ca.ibm.com
Cell 613-295-1674
Voice mail 613-247-5289
"Experience is a hard teacher because she gives the test first, the lesson 
afterwards." -- Unknown

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050119/067abd63/attachment.htm>

From amanthei at redhat.com  Wed Jan 19 15:04:47 2005
From: amanthei at redhat.com (Adam Manthei)
Date: Wed, 19 Jan 2005 09:04:47 -0600
Subject: [Linux-cluster] IBM Blade center
In-Reply-To: <OF60A03EBB.4A84B427-ON05256F8D.007F23A5-05256F8E.0050F5B3@ca.ibm.com>
References: <Pine.LNX.4.61.0501181401170.13132@c-24-130-248-54.we.client2.attbi.com>
	<OF60A03EBB.4A84B427-ON05256F8D.007F23A5-05256F8E.0050F5B3@ca.ibm.com>
Message-ID: <20050119150447.GD27578@redhat.com>

On Wed, Jan 19, 2005 at 09:44:17AM -0500, Tony Boucher wrote:
> Does anyone know of a fencing module that works with IBM Blade center ? We 
> want to be able to  fence by rebooting the blade. 

Use fence_bladecenter.  This will require that you have telnet enabled on
your management module (may require a firmware update)

> There are some issues with fencing through the McData switch. The servers 
> boot from SAN, so when the McData fence module logs in and disables the FC 
> port. The whole node gets hosed. (The OS drive gets fenced off too) 

-- 
Adam Manthei  <amanthei at redhat.com>


From dmorgan at gmi-mr.com  Wed Jan 19 18:00:24 2005
From: dmorgan at gmi-mr.com (Duncan Morgan)
Date: Wed, 19 Jan 2005 10:00:24 -0800
Subject: [Linux-cluster] IBM Blade center
In-Reply-To: <20050119150447.GD27578@redhat.com>
Message-ID: <003e01c4fe50$c03a9500$6204570a@DMorganMobile>

We use the Intel version of the Blade Center and wrote a custom fence
script. It is quite easy to do.

Duncan 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Adam Manthei
Sent: Wednesday, January 19, 2005 7:05 AM
To: linux clistering
Subject: Re: [Linux-cluster] IBM Blade center

On Wed, Jan 19, 2005 at 09:44:17AM -0500, Tony Boucher wrote:
> Does anyone know of a fencing module that works with IBM Blade center
? We 
> want to be able to  fence by rebooting the blade. 

Use fence_bladecenter.  This will require that you have telnet enabled
on
your management module (may require a firmware update)

> There are some issues with fencing through the McData switch. The
servers 
> boot from SAN, so when the McData fence module logs in and disables
the FC 
> port. The whole node gets hosed. (The OS drive gets fenced off too) 

-- 
Adam Manthei  <amanthei at redhat.com>

--
Linux-cluster mailing list
Linux-cluster at redhat.com
http://www.redhat.com/mailman/listinfo/linux-cluster

!DSPAM:41ee771331271363136074!


From daniel at osdl.org  Wed Jan 19 18:47:57 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Wed, 19 Jan 2005 10:47:57 -0800
Subject: [Linux-cluster] cluster failed after 53 hours
In-Reply-To: <20050119085008.GD11569@tykepenguin.com>
References: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>
	<20050118084830.GC12101@tykepenguin.com>
	<1106089819.15101.10.camel@ibm-c.pdx.osdl.net>
	<20050119085008.GD11569@tykepenguin.com>
Message-ID: <1106160476.3041.44.camel@ibm-c.pdx.osdl.net>

On Wed, 2005-01-19 at 00:50, Patrick Caulfield wrote:
> On Tue, Jan 18, 2005 at 03:10:20PM -0800, Daniel McNeil wrote:
> > 
> > There is an DLM ASSERT farther down in log that show error = -105
> > which is ENOBUFS.  Is this happening after the node has decided
> > to leave the cluster?  I just want to make sure a out of memory
> > problem isn't causing the problem.
> > 
> 
> Unfortunately it could be, or it may not be. :( 
> lowcomms_get_buffer() can return NULL if either a) there is no memory to
> allocate a page, or b) the DLM has been shut down. If that happens, -ENOBUFS is
> the result. On balance I would suspect that b) is more likely in this situation.
> 
> One oddity in that log is that the DLM took 10 minutes to shutdown after CMAN
> decided it had to leave the cluster - or did those 34980 lines have to go down a
> serial console? 

Yup.  Serial console.

Daniel


From mshk_00 at hotmail.com  Thu Jan 20 10:44:03 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Thu, 20 Jan 2005 11:44:03 +0100
Subject: [Linux-cluster] How install gfs with dm and lvm2?????
Message-ID: <BAY20-F183E93B2D6939A682B042E8D810@phx.gbl>

Hi, I am here again.

I am trying install gfs from cvs with opendlm (something like that) on a 
system with red hat enterprise 3.0, maintaining the kernel 2.4.21.15.EL.

I found in the page 'http://sources.redhat.com/cluster/gfs/' some 
instructions for it,.
I started installing the device-mapper and applying the patch for this 
module to my kernel (following the instructions in the correspondig file 
INSTALL).

But I have found some problems, when I try apply the patchs contained in the 
package device-mapper-... for the device-mapper and the VFS: the system said 
already exits the mayority of the archives and when I try build the kernel 
(once selected the option device mapper support ) gives me some errors:
  error: symbol '_kstrtab_vcalloc' is already defined'
  eroor: symbol '_ksymtab_vcalloc' is already defined'

What happen?? Is this way the most correct or handy?? Someone could to guide 
me ?
Does it exits any manual or recipe that can to help me??

Thanks, i am a bit lost...
   maria

_________________________________________________________________
Moda para esta temporada. Ponte al d?a de todas las tendencias. 
http://www.msn.es/Mujer/moda/default.asp


From teigland at redhat.com  Thu Jan 20 10:57:58 2005
From: teigland at redhat.com (David Teigland)
Date: Thu, 20 Jan 2005 18:57:58 +0800
Subject: [Linux-cluster] How install gfs with dm and lvm2?????
In-Reply-To: <BAY20-F183E93B2D6939A682B042E8D810@phx.gbl>
References: <BAY20-F183E93B2D6939A682B042E8D810@phx.gbl>
Message-ID: <20050120105758.GG23386@redhat.com>

On Thu, Jan 20, 2005 at 11:44:03AM +0100, maria perez wrote:
> Hi, I am here again.
> 
> I am trying install gfs from cvs with opendlm (something like that) on a 
> system with red hat enterprise 3.0, maintaining the kernel 2.4.21.15.EL.

The code on the cvs head requires a 2.6.10 kernel.

> I found in the page 'http://sources.redhat.com/cluster/gfs/' some 
> instructions for it,.
> I started installing the device-mapper and applying the patch for this 
> module to my kernel (following the instructions in the correspondig file 
> INSTALL).

Use these instructions:
http://sources.redhat.com/cluster/doc/usage.txt

(no device-mapper kernel patches are used with 2.6 kernels)

-- 
Dave Teigland  <teigland at redhat.com>


From info at einetmailer.com  Wed Jan 19 09:41:46 2005
From: info at einetmailer.com (Financial Assistance)
Date: Wed, 19 Jan 2005 03:41:46 -0600
Subject: [Linux-cluster] Buried Under Bills?
Message-ID: <200501190938.j0J9cXw8024252@mx1.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050119/aa917929/attachment.htm>

From daniel at osdl.org  Fri Jan 21 23:46:48 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Fri, 21 Jan 2005 15:46:48 -0800
Subject: [Linux-cluster] ccs_tool ld error on latest cvs
Message-ID: <1106351208.14739.8.camel@ibm-c.pdx.osdl.net>

I tried compile the latest cvs tree against 2.6.10 and hit this
loader error.   I'm compiling on redhat 9.

Any ideas?

make[2]: Entering directory `/Views/redhat-cluster/cluster/ccs/ccs_tool'
gcc -Wall -I. -I../config -I../include -I../lib -Wall -O2 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE `xml2-config --cflags` -DCCS_RELEASE_NAME=\"DEVEL.1106267706\" -I. -I../config -I../include -I../lib -o ccs_tool ccs_tool.c update.c upgrade.c old_parser.c -L../lib `xml2-config --libs` -L/Views/redhat-cluster/cluster/build/lib -lccs -lmagma -lmagmamsg -ldl
/usr/lib/libmagma.so: undefined reference to `pthread_rwlock_rdlock'
/usr/lib/libmagma.so: undefined reference to `pthread_rwlock_unlock'
/usr/lib/libmagma.so: undefined reference to `pthread_rwlock_wrlock'
collect2: ld returned 1 exit status
make[2]: *** [ccs_tool] Error 1

Daniel


From mmatus at dinha.acms.arizona.edu  Fri Jan 21 23:54:42 2005
From: mmatus at dinha.acms.arizona.edu (Marcelo Matus)
Date: Fri, 21 Jan 2005 16:54:42 -0700
Subject: [Linux-cluster] cluster failed after 53 hours
In-Reply-To: <20050118140158.GH12101@tykepenguin.com>
References: <1106011893.15101.6.camel@ibm-c.pdx.osdl.net>
	<20050118140158.GH12101@tykepenguin.com>
Message-ID: <41F19642.1080907@acms.arizona.edu>

We also have some crashes when writting very large files, 5GB or so,
and it seems the problem occurs when we hit the GFS cache limit, where
the machine memory is 4GB (Dual Opteron).

Is there a way to tune the GFS cache to use less memory, let say a maximum
512MB, so we can debug the problem better?

And it is either the remote GFS cache or GNBD, since we can write 8GB or 
larger
files when GFS is mounted locally, ie, when we do the tests in the same 
machine
that exports the GFS device, via GNBD, to the rest of the nodes.

Marcelo

Patrick Caulfield wrote:

>On Mon, Jan 17, 2005 at 05:31:33PM -0800, Daniel McNeil wrote:
>  
>
>>My 3 node cluster ran tests for 53 hours before hitting a problem.
>>    
>>
>
>Attached is a patch to set the CMAN process to run at realtime priority, I'm not
>sure if that's the right thing to do or not to be honest.
>
>Neither am I sure whether your 48-53 hours is significant - it's possible that
>memory may be an issue (only guessing but GFS caches locks like crazy, it may be
>worth cutting this down a bit by tweaking
>
>/proc/cluster/lock_dlm/drop_count    and/or
>/proc/cluster/lock_dlm/drop_period
>
>otherwise, the only way were gpoing to get to the bottom of this is to enable
>"DEBUG_MEMB" in cman and see what it thinks is going on when the node is kicked
>out of the cluster.
>
>
>patrick
>  
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>http://www.redhat.com/mailman/listinfo/linux-cluster
>


From woytek+ at cmu.edu  Sat Jan 22 02:57:19 2005
From: woytek+ at cmu.edu (Jonathan Woytek)
Date: Fri, 21 Jan 2005 21:57:19 -0500
Subject: [Linux-cluster] OOM failures with GFS,
	NFS and Samba on a cluster with RHEL3-AS
Message-ID: <41F1C10F.2080309@cmu.edu>

I have been experiencing OOM failures (followed by reboots) on a cluster 
running Dell PowerEdge 1860's (dual-proc, 4GB RAM) with RHEL3-AS with 
all current updates.

The system is configured as a two-member cluster, running GFS 6.0.2-25 
(RH SRPM) and cluster services 1.2.16-1 (also RH SRPM).  My original 
testing went fine with the cluster, including service fail-over and all 
that stuff (only one lock_gulmd, so if the master goes down, the world 
explodes--but I expected that).

Use seemed to be okay, but there weren't a whole lot of users. 
Recently, a project wanted to serve some data from their space in GFS 
via their own machine.  We mounted their space via NFS from the cluster, 
and they serve their data via samba from their machine.  Shortly 
thereafter, two things happened:  more people started to access the 
data, and the cluster machines started to crash.  The symptoms are that 
free memory drops extremely quickly (sometimes more than 3GB disappears 
in less than two minutes).  Load average usually goes up quickly (when I 
can see it).  NFS processes are normally at the top of top, along with 
kswapd.  At some point, around this time, the kernel starts to spit out 
OOM messages and it starts to kill bunches of processes.  The machine 
eventually reboots itself and comes back up cleanly.

Space of outages seems to be dependent on how many people are using the 
system, but I've also seen the machine go down when the backup system 
runs a few backups on the machine.  One of the things I've noticed, 
though, is that the backup system doesn't actually cause the machine to 
crash if the system has been recently rebooted, and memory usage returns 
to normal after the backup is finished.  Memory usage usually does NOT 
return to completely normal after the gigabytes of memory become used 
(when that happens, the machine will sit there and keep running for a 
while with only 20MB or less free, until something presumably tries to 
use that memory and the machine flips out).  That is the only time I've 
seen the backup system cause the system to crash--after it has endured 
significant usage during the day and there are 20MB or less free.

I'll usually get a call from the culprits telling me that they were 
copying either a) lots of files or b) large files to the cluster.

Any ideas here?  Anything I can look at to tune?

jonathan


From woytek+ at cmu.edu  Sun Jan 23 18:45:28 2005
From: woytek+ at cmu.edu (Jonathan Woytek)
Date: Sun, 23 Jan 2005 13:45:28 -0500
Subject: [Linux-cluster] OOM failures with GFS, NFS and Samba on a cluster
	with RHEL3-AS
In-Reply-To: <41F1C10F.2080309@cmu.edu>
References: <41F1C10F.2080309@cmu.edu>
Message-ID: <41F3F0C8.7020906@cmu.edu>

Additional information:

I enabled full output on lock_gulmd, since my dead top sessions would 
often show that process near the top of the list around the time of 
crashes.  The machine was rebooted around 10:50AM, and was down again at 
12:44.  In the span of less than a minute, the machine plowed through 
over 3GB of memory and crashed.  The extra debugging information from 
lock_gulmd said nothing, except that there was a successful heartbeat. 
The OOM messages began at 12:44:01, and the machine was dead somewhere 
around 12:44:40.  Nobody should be using the machine during this time. 
A cron job that was scheduled to fire off at 12:44 (it runs every two 
minutes to check memory usage, specifically to try to track this 
problem) did not run (or at least was not logged if it did).  I took 
that job out of cron just to make sure that it isn't part of the 
problem.  The low-memory-check that ran at 12:42 reported nothing, and 
my threshold for that is set at 512MB.

The span between crashes this weekend has been between three and eight 
hours.  Yesterday, the machine rebooted (looking at lastlog, not last 
message before restart in /var/log/messages, but I'll be looking at that 
in a bit) at 15:20 (after being up since 23:50 on Friday), 18:27, 21:43, 
  onto sunday at 01:14, 04:33, and finally 12:48.  Something seems quite 
wrong with this.

jonathan


Jonathan Woytek wrote:

> I have been experiencing OOM failures (followed by reboots) on a cluster 
> running Dell PowerEdge 1860's (dual-proc, 4GB RAM) with RHEL3-AS with 
> all current updates.
> 
> The system is configured as a two-member cluster, running GFS 6.0.2-25 
> (RH SRPM) and cluster services 1.2.16-1 (also RH SRPM).  My original 
> testing went fine with the cluster, including service fail-over and all 
> that stuff (only one lock_gulmd, so if the master goes down, the world 
> explodes--but I expected that).
> 
> Use seemed to be okay, but there weren't a whole lot of users. Recently, 
> a project wanted to serve some data from their space in GFS via their 
> own machine.  We mounted their space via NFS from the cluster, and they 
> serve their data via samba from their machine.  Shortly thereafter, two 
> things happened:  more people started to access the data, and the 
> cluster machines started to crash.  The symptoms are that free memory 
> drops extremely quickly (sometimes more than 3GB disappears in less than 
> two minutes).  Load average usually goes up quickly (when I can see 
> it).  NFS processes are normally at the top of top, along with kswapd.  
> At some point, around this time, the kernel starts to spit out OOM 
> messages and it starts to kill bunches of processes.  The machine 
> eventually reboots itself and comes back up cleanly.
> 
> Space of outages seems to be dependent on how many people are using the 
> system, but I've also seen the machine go down when the backup system 
> runs a few backups on the machine.  One of the things I've noticed, 
> though, is that the backup system doesn't actually cause the machine to 
> crash if the system has been recently rebooted, and memory usage returns 
> to normal after the backup is finished.  Memory usage usually does NOT 
> return to completely normal after the gigabytes of memory become used 
> (when that happens, the machine will sit there and keep running for a 
> while with only 20MB or less free, until something presumably tries to 
> use that memory and the machine flips out).  That is the only time I've 
> seen the backup system cause the system to crash--after it has endured 
> significant usage during the day and there are 20MB or less free.
> 
> I'll usually get a call from the culprits telling me that they were 
> copying either a) lots of files or b) large files to the cluster.
> 
> Any ideas here?  Anything I can look at to tune?
> 
> jonathan
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster


From woytek+ at cmu.edu  Thu Jan 20 21:56:03 2005
From: woytek+ at cmu.edu (Jonathan Woytek)
Date: Thu, 20 Jan 2005 16:56:03 -0500
Subject: [Linux-cluster] OOM issues with GFS, NFS, Samba on RHEL3-AS cluster
Message-ID: <41F028F3.7020506@cmu.edu>

Hello.  I've tried to read-up on the lists here to see what I can find 
about these sorts of issues, but the information appears to be somewhat 
sparse.

Here's my situation:  I have a two-member cluster built on RHEL 3 AS 
(with all current updates installed).  That means kernel 
2.4.21-27.0.2.EL with GFS (6.0.2-25) and cluster services (1.2.16-1) 
built from SRPMS distributed by RedHat.  My storage is iSCSI-based over 
gigabit ethernet.  Hardware are Dell PowerEdge 1860's with 4GB of RAM 
and dual 2.4GHz processors.

My problem is that the node serving disk via NFS and Samba gets into a 
strange mode where it starts to get kernel-based out-of-memory errors, 
which start to kill things off.  The machine reboots itself and comes 
back up with no issues.  In the process, of course, it wreaks havoc with 
lock_gulmd and a host of other things, and makes a bunch of users upset 
(it probably didn't help that we've been dealing with unstable storage 
here for a while, and I put this system together with the idea that it 
would be more reliable).

I plan on trying to add a third node, which would fix the lock_gulmd 
craziness.  That's not my big problem, though.  I NEED to figure out why 
this is happening.  My analysis so far seems to indicate that the 
crashes are caused mostly when there are a lot of files open (or at 
least a lot of disk activity).  The failures seem to occur most often 
when people are accessing data (on GFS) from the server over an NFS 
mount to another machine, but they also seem to occur if the machine has 
seen a day's worth of that sort of usage and the backup system tries to 
get its nightly backup between 11PM and 2AM.  When memory starts to get 
low, kswapd shows up and starts eating serious cycles, along with the 
nfsd's.  I've tried increasing the number of nfsd's, but that didn't 
seem to have an effect.

Any ideas on things I should be checking?  Interestingly enough, no swap 
seems to be used when this happens.  The load average normally creeps up 
right before death, and the machine gets down to less than 18MB free 
(though a lot the 4GB is tied up in cache).

jonathan
--
Jonathan Woytek                 w: 412-681-3463         woytek+ at cmu.edu
NREC Computing Manager          c: 412-401-1627         KB3HOZ
PGP Key available upon request


From pierre.filippone at retail-sc.com  Fri Jan 21 11:21:59 2005
From: pierre.filippone at retail-sc.com (Pierre Filippone)
Date: Fri, 21 Jan 2005 12:21:59 +0100
Subject: [Linux-cluster] Cluster aware software raid
Message-ID: <OF4635C11E.879C9E44-ONC1256F90.003C0756-C1256F90.003E6EA9@retail-sc.com>

Hi,

we are trying to use GFS in a FC environment on a two node cluster. 
Additionally to multipathing (performed by IBM's SDD) we want to mirror 
the data on two SAN storages via any kind of raid software. 

As far as I understood, there is currently no software available for linux 
(except Veritas VM) which is able to support this scenario.
I read that CLVM will probably support cluster aware mirroring in the 
future. Are there any estimations, when it will be production ready ?
Will it be released with RH ES 4 ?

In some news groups I saw older postings discussing how to make md cluster 
aware. But, afaik, this also did not happen yet. 
Do you know any other project, which is near to finishing this feature ?

Thanks for your help,

Pierre Filippone
RSC Commercial Services OHG
Bleichstr. 8
40211 D?sseldorf


From woytek+ at cmu.edu  Sun Jan 23 23:12:18 2005
From: woytek+ at cmu.edu (Jonathan Woytek)
Date: Sun, 23 Jan 2005 18:12:18 -0500
Subject: [Linux-cluster] OOM issues with GFS, NFS,
	Samba on RHEL3-AS cluster
In-Reply-To: <41F028F3.7020506@cmu.edu>
References: <41F028F3.7020506@cmu.edu>
Message-ID: <41F42F52.3060502@cmu.edu>

Sorry about the duplicate message--I had sent this when I had a mistake 
in my email address.  When I fixed it, this message apparently went 
through to the list.

jonathan


Jonathan Woytek wrote:

> Hello.  I've tried to read-up on the lists here to see what I can find 
> about these sorts of issues, but the information appears to be somewhat 
> sparse.
> 
> Here's my situation:  I have a two-member cluster built on RHEL 3 AS 
> (with all current updates installed).  That means kernel 
> 2.4.21-27.0.2.EL with GFS (6.0.2-25) and cluster services (1.2.16-1) 
> built from SRPMS distributed by RedHat.  My storage is iSCSI-based over 
> gigabit ethernet.  Hardware are Dell PowerEdge 1860's with 4GB of RAM 
> and dual 2.4GHz processors.
> 
> My problem is that the node serving disk via NFS and Samba gets into a 
> strange mode where it starts to get kernel-based out-of-memory errors, 
> which start to kill things off.  The machine reboots itself and comes 
> back up with no issues.  In the process, of course, it wreaks havoc with 
> lock_gulmd and a host of other things, and makes a bunch of users upset 
> (it probably didn't help that we've been dealing with unstable storage 
> here for a while, and I put this system together with the idea that it 
> would be more reliable).
> 
> I plan on trying to add a third node, which would fix the lock_gulmd 
> craziness.  That's not my big problem, though.  I NEED to figure out why 
> this is happening.  My analysis so far seems to indicate that the 
> crashes are caused mostly when there are a lot of files open (or at 
> least a lot of disk activity).  The failures seem to occur most often 
> when people are accessing data (on GFS) from the server over an NFS 
> mount to another machine, but they also seem to occur if the machine has 
> seen a day's worth of that sort of usage and the backup system tries to 
> get its nightly backup between 11PM and 2AM.  When memory starts to get 
> low, kswapd shows up and starts eating serious cycles, along with the 
> nfsd's.  I've tried increasing the number of nfsd's, but that didn't 
> seem to have an effect.
> 
> Any ideas on things I should be checking?  Interestingly enough, no swap 
> seems to be used when this happens.  The load average normally creeps up 
> right before death, and the machine gets down to less than 18MB free 
> (though a lot the 4GB is tied up in cache).
> 
> jonathan
> -- 
> Jonathan Woytek                 w: 412-681-3463         woytek+ at cmu.edu
> NREC Computing Manager          c: 412-401-1627         KB3HOZ
> PGP Key available upon request
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Jonathan Woytek                 w: 412-681-3463         woytek+ at cmu.edu
NREC Computing Manager          c: 412-401-1627         KB3HOZ
PGP Key available upon request


From woytek+ at cmu.edu  Mon Jan 24 04:27:52 2005
From: woytek+ at cmu.edu (Jonathan Woytek)
Date: Sun, 23 Jan 2005 23:27:52 -0500
Subject: [Linux-cluster] OOM failures with GFS, NFS and Samba on a cluster
	with RHEL3-AS
In-Reply-To: <41F3F0C8.7020906@cmu.edu>
References: <41F1C10F.2080309@cmu.edu> <41F3F0C8.7020906@cmu.edu>
Message-ID: <41F47948.6050501@cmu.edu>

Even more additional information:

I've been monitoring the system through a few crashes now, and it looks 
like what is actually running out of memory is "lowmem".  The system 
seems to eat about 130-140kB every two seconds.  It seems that the 
system is NOT actually plowing through 3GB+ of memory--highmem does not 
seem to drop.

Whee fun.

jonathan


Jonathan Woytek wrote:

> Additional information:
> 
> I enabled full output on lock_gulmd, since my dead top sessions would 
> often show that process near the top of the list around the time of 
> crashes.  The machine was rebooted around 10:50AM, and was down again at 
> 12:44.  In the span of less than a minute, the machine plowed through 
> over 3GB of memory and crashed.  The extra debugging information from 
> lock_gulmd said nothing, except that there was a successful heartbeat. 
> The OOM messages began at 12:44:01, and the machine was dead somewhere 
> around 12:44:40.  Nobody should be using the machine during this time. A 
> cron job that was scheduled to fire off at 12:44 (it runs every two 
> minutes to check memory usage, specifically to try to track this 
> problem) did not run (or at least was not logged if it did).  I took 
> that job out of cron just to make sure that it isn't part of the 
> problem.  The low-memory-check that ran at 12:42 reported nothing, and 
> my threshold for that is set at 512MB.
> 
> The span between crashes this weekend has been between three and eight 
> hours.  Yesterday, the machine rebooted (looking at lastlog, not last 
> message before restart in /var/log/messages, but I'll be looking at that 
> in a bit) at 15:20 (after being up since 23:50 on Friday), 18:27, 21:43, 
>  onto sunday at 01:14, 04:33, and finally 12:48.  Something seems quite 
> wrong with this.
> 
> jonathan
> 
> 
> Jonathan Woytek wrote:
> 
>> I have been experiencing OOM failures (followed by reboots) on a 
>> cluster running Dell PowerEdge 1860's (dual-proc, 4GB RAM) with 
>> RHEL3-AS with all current updates.
>>
>> The system is configured as a two-member cluster, running GFS 6.0.2-25 
>> (RH SRPM) and cluster services 1.2.16-1 (also RH SRPM).  My original 
>> testing went fine with the cluster, including service fail-over and 
>> all that stuff (only one lock_gulmd, so if the master goes down, the 
>> world explodes--but I expected that).
>>
>> Use seemed to be okay, but there weren't a whole lot of users. 
>> Recently, a project wanted to serve some data from their space in GFS 
>> via their own machine.  We mounted their space via NFS from the 
>> cluster, and they serve their data via samba from their machine.  
>> Shortly thereafter, two things happened:  more people started to 
>> access the data, and the cluster machines started to crash.  The 
>> symptoms are that free memory drops extremely quickly (sometimes more 
>> than 3GB disappears in less than two minutes).  Load average usually 
>> goes up quickly (when I can see it).  NFS processes are normally at 
>> the top of top, along with kswapd.  At some point, around this time, 
>> the kernel starts to spit out OOM messages and it starts to kill 
>> bunches of processes.  The machine eventually reboots itself and comes 
>> back up cleanly.
>>
>> Space of outages seems to be dependent on how many people are using 
>> the system, but I've also seen the machine go down when the backup 
>> system runs a few backups on the machine.  One of the things I've 
>> noticed, though, is that the backup system doesn't actually cause the 
>> machine to crash if the system has been recently rebooted, and memory 
>> usage returns to normal after the backup is finished.  Memory usage 
>> usually does NOT return to completely normal after the gigabytes of 
>> memory become used (when that happens, the machine will sit there and 
>> keep running for a while with only 20MB or less free, until something 
>> presumably tries to use that memory and the machine flips out).  That 
>> is the only time I've seen the backup system cause the system to 
>> crash--after it has endured significant usage during the day and there 
>> are 20MB or less free.
>>
>> I'll usually get a call from the culprits telling me that they were 
>> copying either a) lots of files or b) large files to the cluster.
>>
>> Any ideas here?  Anything I can look at to tune?
>>
>> jonathan
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster


From jaime at iaa.es  Mon Jan 24 09:18:49 2005
From: jaime at iaa.es (Jaime Perea)
Date: Mon, 24 Jan 2005 10:18:49 +0100
Subject: [Linux-cluster] ccs_tool ld error on latest cvs
In-Reply-To: <1106351208.14739.8.camel@ibm-c.pdx.osdl.net>
References: <1106351208.14739.8.camel@ibm-c.pdx.osdl.net>
Message-ID: <200501241018.49756.jaime@iaa.es>

Hi everybody,

My first posting!

Perhaps doing 
LDFLAGS="-lpthread"  make 

could work. 

-- 

           Jaime D. Perea Duarte. <jaime at iaa dot es>
             Linux registered user #10472

           Dep. Astrofisica Extragalactica.
           Instituto de Astrofisica de Andalucia (CSIC)
           Apdo. 3004, 18080 Granada, Spain. 


El S?bado, 22 de Enero de 2005 00:46, Daniel McNeil escribi?:
> I tried compile the latest cvs tree against 2.6.10 and hit this
> loader error.   I'm compiling on redhat 9.
>
> Any ideas?
>
> make[2]: Entering directory `/Views/redhat-cluster/cluster/ccs/ccs_tool'
> gcc -Wall -I. -I../config -I../include -I../lib -Wall -O2
> -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE `xml2-config --cflags`
> -DCCS_RELEASE_NAME=\"DEVEL.1106267706\" -I. -I../config -I../include
> -I../lib -o ccs_tool ccs_tool.c update.c upgrade.c old_parser.c -L../lib
> `xml2-config --libs` -L/Views/redhat-cluster/cluster/build/lib -lccs
> -lmagma -lmagmamsg -ldl /usr/lib/libmagma.so: undefined reference to
> `pthread_rwlock_rdlock' /usr/lib/libmagma.so: undefined reference to
> `pthread_rwlock_unlock' /usr/lib/libmagma.so: undefined reference to
> `pthread_rwlock_wrlock' collect2: ld returned 1 exit status
> make[2]: *** [ccs_tool] Error 1
>
> Daniel
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster


From mtilstra at redhat.com  Mon Jan 24 14:38:33 2005
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Mon, 24 Jan 2005 08:38:33 -0600
Subject: [Linux-cluster] OOM failures with GFS,
	NFS and Samba on a cluster with RHEL3-AS
In-Reply-To: <41F3F0C8.7020906@cmu.edu>
References: <41F1C10F.2080309@cmu.edu> <41F3F0C8.7020906@cmu.edu>
Message-ID: <20050124143833.GA30145@redhat.com>

On Sun, Jan 23, 2005 at 01:45:28PM -0500, Jonathan Woytek wrote:
> Additional information:
> 
> I enabled full output on lock_gulmd, since my dead top sessions would 
> often show that process near the top of the list around the time of 
> crashes.  The machine was rebooted around 10:50AM, and was down again at 

Not suprising that lock_gulmd is working hard when gfs is under heavy
use.  Its it busy processing all those lock requests.  What would be
more useful from gulm for this than the logging messages, is to query
the locktable every so often for its stats.
`gulm_tool getstats <master>:lt000`
The 'locks = ###' line is how many lock structures are current held.
gulm is very greedy about memory, and you are running the lock servers
on the same nodes you're mounting from.


also, just to see if I read the first post right, you have
samba->nfs->gfs?

-- 
Michael Conrad Tadpol Tilstra
i'm trying to think, but nothing's happening...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050124/da298e2c/attachment.sig>

From woytek+ at cmu.edu  Mon Jan 24 18:36:47 2005
From: woytek+ at cmu.edu (Jonathan Woytek)
Date: Mon, 24 Jan 2005 13:36:47 -0500
Subject: [Linux-cluster] OOM failures with GFS, NFS and Samba on a cluster
	with RHEL3-AS
In-Reply-To: <20050124143833.GA30145@redhat.com>
References: <41F1C10F.2080309@cmu.edu> <41F3F0C8.7020906@cmu.edu>
	<20050124143833.GA30145@redhat.com>
Message-ID: <41F5403F.4070207@cmu.edu>

Michael Conrad Tadpol Tilstra wrote:
> On Sun, Jan 23, 2005 at 01:45:28PM -0500, Jonathan Woytek wrote:
> 
>>Additional information:
>>
>>I enabled full output on lock_gulmd, since my dead top sessions would 
>>often show that process near the top of the list around the time of 
>>crashes.  The machine was rebooted around 10:50AM, and was down again at 
> 
> 
> Not suprising that lock_gulmd is working hard when gfs is under heavy
> use.  Its it busy processing all those lock requests.  What would be
> more useful from gulm for this than the logging messages, is to query
> the locktable every so often for its stats.
> `gulm_tool getstats <master>:lt000`
> The 'locks = ###' line is how many lock structures are current held.
> gulm is very greedy about memory, and you are running the lock servers
> on the same nodes you're mounting from.

Here are the stats from the master lock_gulmd lt000:

I_am = Master
run time = 9436
pid = 2205
verbosity = Default
id = 0
partitions = 1
out_queue = 0
drpb_queue = 0
locks = 20356
unlocked = 17651
exclusive = 15
shared = 2690
deferred = 0
lvbs = 17661
expired = 0
lock ops = 107354
conflicts = 0
incomming_queue = 0
conflict_queue = 0
reply_queue = 0
free_locks = 69644
free_lkrqs = 60
used_lkrqs = 0
free_holders = 109634
used_holders = 20366
highwater = 1048576


Something keeps eating away at lowmem, though, and I still can't figure 
out what exactly it is.


> also, just to see if I read the first post right, you have
> samba->nfs->gfs?

If I understand your arrows correctly, I have a filesystem mounted with 
GFS that I'm sharing via NFS to another machine that is sharing it via 
Samba.  I've closed that link, though, to try to eliminate that as a 
problem.  So now I'm serving the GFS filesystem directly through Samba.

jonathan

-- 
Jonathan Woytek                 w: 412-681-3463         woytek+ at cmu.edu
NREC Computing Manager          c: 412-401-1627         KB3HOZ
PGP Key available upon request


From woytek+ at cmu.edu  Mon Jan 24 18:43:29 2005
From: woytek+ at cmu.edu (Jonathan Woytek)
Date: Mon, 24 Jan 2005 13:43:29 -0500
Subject: [Linux-cluster] OOM failures with GFS, NFS and Samba on a cluster
	with RHEL3-AS
In-Reply-To: <41F5403F.4070207@cmu.edu>
References: <41F1C10F.2080309@cmu.edu>
	<41F3F0C8.7020906@cmu.edu>	<20050124143833.GA30145@redhat.com>
	<41F5403F.4070207@cmu.edu>
Message-ID: <41F541D1.5050305@cmu.edu>

/proc/meminfo:
         total:    used:    free:  shared: buffers:  cached:
Mem:  4189741056 925650944 3264090112        0 18685952 76009472
Swap: 2146787328        0 2146787328
MemTotal:      4091544 kB
MemFree:       3187588 kB
MemShared:           0 kB
Buffers:         18248 kB
Cached:          74228 kB
SwapCached:          0 kB
Active:         107232 kB
ActiveAnon:      50084 kB
ActiveCache:     57148 kB
Inact_dirty:      1892 kB
Inact_laundry:   16276 kB
Inact_clean:     16616 kB
Inact_target:    28400 kB
HighTotal:     3276544 kB
HighFree:      3164096 kB
LowTotal:       815000 kB
LowFree:         23492 kB
SwapTotal:     2096472 kB
SwapFree:      2096472 kB
Committed_AS:    72244 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

When a bunch of locks become free, lowmem seems to recover somewhat. 
However, shutting down lock_gulmd entirely does NOT return lowmem to 
what it probably should be (though I'm not sure if the system is just 
keeping all of that memory cached until something else needs it or not).

jonathan

Jonathan Woytek wrote:
> Michael Conrad Tadpol Tilstra wrote:
> 
>> On Sun, Jan 23, 2005 at 01:45:28PM -0500, Jonathan Woytek wrote:
>>
>>> Additional information:
>>>
>>> I enabled full output on lock_gulmd, since my dead top sessions would 
>>> often show that process near the top of the list around the time of 
>>> crashes.  The machine was rebooted around 10:50AM, and was down again at 
>>
>>
>>
>> Not suprising that lock_gulmd is working hard when gfs is under heavy
>> use.  Its it busy processing all those lock requests.  What would be
>> more useful from gulm for this than the logging messages, is to query
>> the locktable every so often for its stats.
>> `gulm_tool getstats <master>:lt000`
>> The 'locks = ###' line is how many lock structures are current held.
>> gulm is very greedy about memory, and you are running the lock servers
>> on the same nodes you're mounting from.
> 
> 
> Here are the stats from the master lock_gulmd lt000:
> 
> I_am = Master
> run time = 9436
> pid = 2205
> verbosity = Default
> id = 0
> partitions = 1
> out_queue = 0
> drpb_queue = 0
> locks = 20356
> unlocked = 17651
> exclusive = 15
> shared = 2690
> deferred = 0
> lvbs = 17661
> expired = 0
> lock ops = 107354
> conflicts = 0
> incomming_queue = 0
> conflict_queue = 0
> reply_queue = 0
> free_locks = 69644
> free_lkrqs = 60
> used_lkrqs = 0
> free_holders = 109634
> used_holders = 20366
> highwater = 1048576
> 
> 
> Something keeps eating away at lowmem, though, and I still can't figure 
> out what exactly it is.
> 
> 
>> also, just to see if I read the first post right, you have
>> samba->nfs->gfs?
> 
> 
> If I understand your arrows correctly, I have a filesystem mounted with 
> GFS that I'm sharing via NFS to another machine that is sharing it via 
> Samba.  I've closed that link, though, to try to eliminate that as a 
> problem.  So now I'm serving the GFS filesystem directly through Samba.
> 
> jonathan
> 

-- 
Jonathan Woytek                 w: 412-681-3463         woytek+ at cmu.edu
NREC Computing Manager          c: 412-401-1627         KB3HOZ
PGP Key available upon request


From laza at yu.net  Mon Jan 24 19:57:28 2005
From: laza at yu.net (Lazar Obradovic)
Date: Mon, 24 Jan 2005 20:57:28 +0100
Subject: [Linux-cluster] multipath/gfs lockout under heavy write
Message-ID: <1106596648.13534.79.camel@laza.eunet.yu>

Hello,  

I'm not quite sure if the problem I'm experiencing is GFS or
dm-multi/multipath issue, so I'm writing to both lists... sorry for that
and please trim as soon as you realise who is it for.

This is the scenario: 

I've created two-node cluster and mounted two LVs on each of them: 

/dev/vg/data on /mnt/data type gfs (rw)
/dev/vg/syslog on /var/log/ng type gfs (rw)

Each node is running 2.6.10 with udm2 patch set, GFS and LVM2 fetched
from CVS on Jan, 19th and multipath-tools-0.4.1. Storage controller is
HSV110, and has two paths from each server to it: 

# multipath -v2 
create: 3600508b400013a6c00006000009c0000
[size=500 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [first]
  \_ 0:0:0:1 sda  8:0     [faulty]
  \_ 0:0:1:1 sdb  8:16    [ready ]
  \_ 0:0:2:1 sdc  8:32    [faulty]
  \_ 0:0:3:1 sdd  8:48    [ready ]

I tried to copy 100Gb of large files (each of them is about 15Gb) to
a /mnt/data through SSH connection from the third server to one of the
clustered. Looking at switch statistics, I saw that traffic was indeed
balanced over both FC links, but after copying almost 80Gb, without any
reason or unusual event on SAN/storage side, /dev/vg/data reported: 

SCSI error : <0 0 1 1> return code = 0x20000
end_request: I/O error, dev sdb, sector 401320376
end_request: I/O error, dev sdb, sector 401320384
Device sda not ready.
SCSI error : <0 0 3 1> return code = 0x20000
end_request: I/O error, dev sdd, sector 401321168
end_request: I/O error, dev sdd, sector 401321176
Buffer I/O error on device diapered_dm-2, logical block 37057899
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057900
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057901
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057902
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057903
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057904
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057905
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057906
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057907
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37057908
lost page write due to I/O error on diapered_dm-2
GFS: fsid=admin:data.0: fatal: I/O error
GFS: fsid=admin:data.0:   block = 37057898
GFS: fsid=admin:data.0:   function = gfs_dwrite
GFS: fsid=admin:data.0:   file = /usr/src/cluster/gfs-kernel/src/gfs/dio.c, line = 651
GFS: fsid=admin:data.0:   time = 1106582338
GFS: fsid=admin:data.0: about to withdraw from the cluster
GFS: fsid=admin:data.0: waiting for outstanding I/O
SCSI error : <0 0 1 1> return code = 0x20000
Device sdc not ready.
GFS: fsid=admin:data.0: warning: assertion "!buffer_busy(bh)" failed
GFS: fsid=admin:data.0:   function = gfs_logbh_uninit
GFS: fsid=admin:data.0:   file = /usr/src/cluster/gfs-kernel/src/gfs/dio.c, line = 930
GFS: fsid=admin:data.0:   time = 1106582351
printk: 54 messages suppressed.
Buffer I/O error on device diapered_dm-2, logical block 36272387
lost page write due to I/O error on diapered_dm-2
Buffer I/O error on device diapered_dm-2, logical block 37024703
lost page write due to I/O error on diapered_dm-2
GFS: fsid=admin:data.0: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=admin:data.0: withdrawn
printk: 12 messages suppressed.
Buffer I/O error on device diapered_dm-2, logical block 37005453
lost page write due to I/O error on diapered_dm-2
printk: 1036 messages suppressed.
Buffer I/O error on device diapered_dm-2, logical block 37006489
lost page write due to I/O error on diapered_dm-2
printk: 1035 messages suppressed.
Buffer I/O error on device diapered_dm-2, logical block 37007525
lost page write due to I/O error on diapered_dm-2

while /dev/vg/syslog continued to work as usual (dd-ing /dev/zero to
some file worked like a charm). After that error, SCP died, and I
couldn't umount nor remount that filesystem. Fenced didn't triggered so
I had to reboot the machine in order to make it work again (and I'm
using fence_ibmblade which works on another cluster I have).

Since both LVs are a part of same VG (and, thus, are using the same
physical device seen over multipath), I'd guess the problem is somewhere
inside GFS, but the things that keep confusing me are: 

- those SCSI errors that look like multipath errors
- name 'diapered_dm-2' which I never saw before
- fenced not fencing obviously faulty node

What else do you need to debug this issue?

Once again, sorry for the cross-post... 

-- 
Lazar Obradovic <laza at yu.net>
YUnet International, NOC


From woytek+ at cmu.edu  Mon Jan 24 21:37:54 2005
From: woytek+ at cmu.edu (Jonathan Woytek)
Date: Mon, 24 Jan 2005 16:37:54 -0500
Subject: [Linux-cluster] OOM failures with GFS, NFS and Samba on a cluster
	with RHEL3-AS
In-Reply-To: <41F541D1.5050305@cmu.edu>
References: <41F1C10F.2080309@cmu.edu>
	<41F3F0C8.7020906@cmu.edu>	<20050124143833.GA30145@redhat.com>
	<41F5403F.4070207@cmu.edu> <41F541D1.5050305@cmu.edu>
Message-ID: <41F56AB2.90906@cmu.edu>

Yet more and more info:

Jan 24 16:17:00 quicksilver kernel: Mem-info:
Jan 24 16:17:00 quicksilver kernel: Zone:DMA freepages:  2835 min:     0 
low:     0 high:     0
Jan 24 16:17:00 quicksilver kernel: Zone:Normal freepages:  1034 min: 
1279 low:  4544 high:  6304
Jan 24 16:17:00 quicksilver kernel: Zone:HighMem freepages:759901 min: 
  255 low: 15872 high: 23808
Jan 24 16:17:00 quicksilver kernel: Free pages:      763768 (759901 HighMem)
Jan 24 16:17:00 quicksilver kernel: ( Active: 22610/25584, 
inactive_laundry: 3922, inactive_clean: 3890, free: 763768 )
Jan 24 16:17:00 quicksilver kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2835
Jan 24 16:17:00 quicksilver kernel:   aa:0 ac:27 id:0 il:115 ic:0 fr:1026
Jan 24 16:17:00 quicksilver kernel:   aa:12742 ac:9847 id:25584 il:3807 
ic:3890 fr:759901
Jan 24 16:17:00 quicksilver kernel: 1*4kB 1*8kB 0*16kB 0*32kB 1*64kB 
0*128kB 2*256kB 1*512kB 0*1024kB 1*2048kB 2*4096kB = 11340kB)
Jan 24 16:17:00 quicksilver kernel: 272*4kB 19*8kB 1*16kB 1*32kB 1*64kB 
1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3784kB)
Jan 24 16:17:01 quicksilver kernel: 43*4kB 17*8kB 2*16kB 7*32kB 1*64kB 
78*128kB 138*256kB 89*512kB 83*1024kB 32*2048kB 683*4096kB =
3039604kB)
Jan 24 16:17:01 quicksilver kernel: Swap cache: add 0, delete 0, find 
0/0, race 0+0
Jan 24 16:17:01 quicksilver kernel: 197629 pages of slabcache
Jan 24 16:17:01 quicksilver kernel: 328 pages of kernel stacks
Jan 24 16:17:01 quicksilver kernel: 0 lowmem pagetables, 529 highmem 
pagetables
Jan 24 16:17:01 quicksilver kernel: Free swap:       2096472kB
Jan 24 16:17:01 quicksilver kernel: 1245184 pages of RAM
Jan 24 16:17:01 quicksilver kernel: 819136 pages of HIGHMEM
Jan 24 16:17:01 quicksilver kernel: 222298 reserved pages
Jan 24 16:17:01 quicksilver kernel: 38487 pages shared
Jan 24 16:17:01 quicksilver kernel: 0 pages swap cached
Jan 24 16:17:01 quicksilver kernel: Out of Memory: Killed process 2441 
(sendmail).
Jan 24 16:17:01 quicksilver kernel: Out of Memory: Killed process 2441 
(sendmail).
Jan 24 16:17:01 quicksilver kernel: Fixed up OOM kill of mm-less task


The machine reports OOM kills for about 15-30 seconds before clumembd 
gets killed and the machine reboots.

The OOM kills usually begin at the top of the minute, though that 
probably doesn't have anything to do with anything except coincidence.

jonathan


Jonathan Woytek wrote:
> /proc/meminfo:
>         total:    used:    free:  shared: buffers:  cached:
> Mem:  4189741056 925650944 3264090112        0 18685952 76009472
> Swap: 2146787328        0 2146787328
> MemTotal:      4091544 kB
> MemFree:       3187588 kB
> MemShared:           0 kB
> Buffers:         18248 kB
> Cached:          74228 kB
> SwapCached:          0 kB
> Active:         107232 kB
> ActiveAnon:      50084 kB
> ActiveCache:     57148 kB
> Inact_dirty:      1892 kB
> Inact_laundry:   16276 kB
> Inact_clean:     16616 kB
> Inact_target:    28400 kB
> HighTotal:     3276544 kB
> HighFree:      3164096 kB
> LowTotal:       815000 kB
> LowFree:         23492 kB
> SwapTotal:     2096472 kB
> SwapFree:      2096472 kB
> Committed_AS:    72244 kB
> HugePages_Total:     0
> HugePages_Free:      0
> Hugepagesize:     2048 kB
> 
> When a bunch of locks become free, lowmem seems to recover somewhat. 
> However, shutting down lock_gulmd entirely does NOT return lowmem to 
> what it probably should be (though I'm not sure if the system is just 
> keeping all of that memory cached until something else needs it or not).
> 
> jonathan
> 
> Jonathan Woytek wrote:
> 
>> Michael Conrad Tadpol Tilstra wrote:
>>
>>> On Sun, Jan 23, 2005 at 01:45:28PM -0500, Jonathan Woytek wrote:
>>>
>>>> Additional information:
>>>>
>>>> I enabled full output on lock_gulmd, since my dead top sessions 
>>>> would often show that process near the top of the list around the 
>>>> time of crashes.  The machine was rebooted around 10:50AM, and was 
>>>> down again at 
>>>
>>>
>>>
>>>
>>> Not suprising that lock_gulmd is working hard when gfs is under heavy
>>> use.  Its it busy processing all those lock requests.  What would be
>>> more useful from gulm for this than the logging messages, is to query
>>> the locktable every so often for its stats.
>>> `gulm_tool getstats <master>:lt000`
>>> The 'locks = ###' line is how many lock structures are current held.
>>> gulm is very greedy about memory, and you are running the lock servers
>>> on the same nodes you're mounting from.
>>
>>
>>
>> Here are the stats from the master lock_gulmd lt000:
>>
>> I_am = Master
>> run time = 9436
>> pid = 2205
>> verbosity = Default
>> id = 0
>> partitions = 1
>> out_queue = 0
>> drpb_queue = 0
>> locks = 20356
>> unlocked = 17651
>> exclusive = 15
>> shared = 2690
>> deferred = 0
>> lvbs = 17661
>> expired = 0
>> lock ops = 107354
>> conflicts = 0
>> incomming_queue = 0
>> conflict_queue = 0
>> reply_queue = 0
>> free_locks = 69644
>> free_lkrqs = 60
>> used_lkrqs = 0
>> free_holders = 109634
>> used_holders = 20366
>> highwater = 1048576
>>
>>
>> Something keeps eating away at lowmem, though, and I still can't 
>> figure out what exactly it is.
>>
>>
>>> also, just to see if I read the first post right, you have
>>> samba->nfs->gfs?
>>
>>
>>
>> If I understand your arrows correctly, I have a filesystem mounted 
>> with GFS that I'm sharing via NFS to another machine that is sharing 
>> it via Samba.  I've closed that link, though, to try to eliminate that 
>> as a problem.  So now I'm serving the GFS filesystem directly through 
>> Samba.
>>
>> jonathan
>>
> 

-- 
Jonathan Woytek                 w: 412-681-3463         woytek+ at cmu.edu
NREC Computing Manager          c: 412-401-1627         KB3HOZ
PGP Key available upon request


From teigland at redhat.com  Tue Jan 25 04:06:24 2005
From: teigland at redhat.com (David Teigland)
Date: Tue, 25 Jan 2005 12:06:24 +0800
Subject: [Linux-cluster] multipath/gfs lockout under heavy write
In-Reply-To: <1106596648.13534.79.camel@laza.eunet.yu>
References: <1106596648.13534.79.camel@laza.eunet.yu>
Message-ID: <20050125040624.GB5786@redhat.com>

On Mon, Jan 24, 2005 at 08:57:28PM +0100, Lazar Obradovic wrote:
 
> Since both LVs are a part of same VG (and, thus, are using the same
> physical device seen over multipath), I'd guess the problem is somewhere
> inside GFS, but the things that keep confusing me are: 
> 
> - those SCSI errors that look like multipath errors

The SCSI errors appear to be the root problem, not GFS.  I don't know what
multipath might have to do with it.

> - name 'diapered_dm-2' which I never saw before

In the past, GFS would immediately panic the machine when it saw i/o
errors.  Now it tries to shut down the bad fs instead.  After this happens
you should be able to unmount the offending fs, leave the cluster and
reboot the machine cleanly.

> - fenced not fencing obviously faulty node

In your situation, the node is running fine wrt the cluster so there's no
need to fence it.  GFS is just shutting down a faulty fs (doing this is
not always very "clean" and can produce a lot of errors/warnings on the
console.)

Perhaps we could reinstate an option to have gfs panic immediately when it
sees i/o errors instead of trying to shut down the problem fs.  In this
case, the panicked node would be "dead" and it would be fenced.

-- 
Dave Teigland  <teigland at redhat.com>


From mmatus at dinha.acms.arizona.edu  Tue Jan 25 08:41:54 2005
From: mmatus at dinha.acms.arizona.edu (Marcelo Matus)
Date: Tue, 25 Jan 2005 01:41:54 -0700
Subject: [Linux-cluster] multipath/gfs lockout under heavy write
In-Reply-To: <20050125040624.GB5786@redhat.com>
References: <1106596648.13534.79.camel@laza.eunet.yu>
	<20050125040624.GB5786@redhat.com>
Message-ID: <41F60652.7000908@acms.arizona.edu>

David Teigland wrote:

>On Mon, Jan 24, 2005 at 08:57:28PM +0100, Lazar Obradovic wrote:
> 
>  
>
>>Since both LVs are a part of same VG (and, thus, are using the same
>>physical device seen over multipath), I'd guess the problem is somewhere
>>inside GFS, but the things that keep confusing me are: 
>>
>>- those SCSI errors that look like multipath errors
>>    
>>
>
>The SCSI errors appear to be the root problem, not GFS.  I don't know what
>multipath might have to do with it.
>
>  
>
>>- name 'diapered_dm-2' which I never saw before
>>    
>>
>
>In the past, GFS would immediately panic the machine when it saw i/o
>errors.  Now it tries to shut down the bad fs instead.  After this happens
>you should be able to unmount the offending fs, leave the cluster and
>reboot the machine cleanly.
>  
>

I have a question about your last comment. We did the following 
experiment with GFS 6.0.2:

1.- Setup a cluster using a unique GFS server and gnbd device (lock_gulm 
master
     and gnbd_export in the same node).

2.- Fence out a node manually using fence_gnbd.

then we observed two cases:

1.- If the fenced machine is not mounting the GFS/gnbd fs, but only 
importing it, then we
     can properly either reboot or restart the GFS services with no problem.

2.- If the fenced machine is mounting the GFS/gnbd fs, but with no 
process using it,
     almost everything produces a kernel panic, even just unmounting the 
unused fs.
     In fact the only thing that works, besides pushing the reset 
button, is 'reboot -f',
     which is almost the same.

So, when you say "In the past", do you refer to GFS 6.0.2 ?

>  
>
>>- fenced not fencing obviously faulty node
>>    
>>
>
>In your situation, the node is running fine wrt the cluster so there's no
>need to fence it.  GFS is just shutting down a faulty fs (doing this is
>not always very "clean" and can produce a lot of errors/warnings on the
>console.)
>
>Perhaps we could reinstate an option to have gfs panic immediately when it
>sees i/o errors instead of trying to shut down the problem fs.  In this
>case, the panicked node would be "dead" and it would be fenced.
>
>  
>


From teigland at redhat.com  Tue Jan 25 09:12:06 2005
From: teigland at redhat.com (David Teigland)
Date: Tue, 25 Jan 2005 17:12:06 +0800
Subject: [Linux-cluster] multipath/gfs lockout under heavy write
In-Reply-To: <41F60652.7000908@acms.arizona.edu>
References: <1106596648.13534.79.camel@laza.eunet.yu>
	<20050125040624.GB5786@redhat.com>
	<41F60652.7000908@acms.arizona.edu>
Message-ID: <20050125091206.GE5786@redhat.com>

On Tue, Jan 25, 2005 at 01:41:54AM -0700, Marcelo Matus wrote:

> >In the past, GFS would immediately panic the machine when it saw i/o
> >errors.  Now it tries to shut down the bad fs instead.  After this happens
> >you should be able to unmount the offending fs, leave the cluster and
> >reboot the machine cleanly.
> 
> I have a question about your last comment. We did the following 
> experiment with GFS 6.0.2:
> 
> 1.- Setup a cluster using a unique GFS server and gnbd device (lock_gulm 
> master and gnbd_export in the same node).
> 
> 2.- Fence out a node manually using fence_gnbd.
> 
> then we observed two cases:
> 
> 1.- If the fenced machine is not mounting the GFS/gnbd fs, but only
> importing it, then we can properly either reboot or restart the GFS
> services with no problem.
> 
> 2.- If the fenced machine is mounting the GFS/gnbd fs, but with no
> process using it, almost everything produces a kernel panic, even just
> unmounting the unused fs.  In fact the only thing that works, besides
> pushing the reset button, is 'reboot -f', which is almost the same.
> 
> So, when you say "In the past", do you refer to GFS 6.0.2 ?

I was actually referring to the code Lazar is using which is the next, as
yet unreleased, version of GFS from the public cvs.  Your situation could
be explained similarly, like this:

- running fence_gnbd causes the node to get i/o errors if it tries to use
  gnbd

- if the node has GFS mounted, GFS will try to use gnbd

- when GFS 6.0.2 sees i/o errors it will panic

If you don't have GFS mounted, the last two steps don't exist and there's
no panic.

-- 
Dave Teigland  <teigland at redhat.com>


From mmatus at dinha.acms.arizona.edu  Tue Jan 25 09:20:50 2005
From: mmatus at dinha.acms.arizona.edu (Marcelo Matus)
Date: Tue, 25 Jan 2005 02:20:50 -0700
Subject: [Linux-cluster] multipath/gfs lockout under heavy write
In-Reply-To: <20050125091206.GE5786@redhat.com>
References: <1106596648.13534.79.camel@laza.eunet.yu>	<20050125040624.GB5786@redhat.com>	<41F60652.7000908@acms.arizona.edu>
	<20050125091206.GE5786@redhat.com>
Message-ID: <41F60F72.6080900@acms.arizona.edu>

David Teigland wrote:

>On Tue, Jan 25, 2005 at 01:41:54AM -0700, Marcelo Matus wrote:
>
>  
>
>>>In the past, GFS would immediately panic the machine when it saw i/o
>>>errors.  Now it tries to shut down the bad fs instead.  After this happens
>>>you should be able to unmount the offending fs, leave the cluster and
>>>reboot the machine cleanly.
>>>      
>>>
>>I have a question about your last comment. We did the following 
>>experiment with GFS 6.0.2:
>>
>>1.- Setup a cluster using a unique GFS server and gnbd device (lock_gulm 
>>master and gnbd_export in the same node).
>>
>>2.- Fence out a node manually using fence_gnbd.
>>
>>then we observed two cases:
>>
>>1.- If the fenced machine is not mounting the GFS/gnbd fs, but only
>>importing it, then we can properly either reboot or restart the GFS
>>services with no problem.
>>
>>2.- If the fenced machine is mounting the GFS/gnbd fs, but with no
>>process using it, almost everything produces a kernel panic, even just
>>unmounting the unused fs.  In fact the only thing that works, besides
>>pushing the reset button, is 'reboot -f', which is almost the same.
>>
>>So, when you say "In the past", do you refer to GFS 6.0.2 ?
>>    
>>
>
>I was actually referring to the code Lazar is using which is the next, as
>yet unreleased, version of GFS from the public cvs.  Your situation could
>be explained similarly, like this:
>
>- running fence_gnbd causes the node to get i/o errors if it tries to use
>  gnbd
>
>- if the node has GFS mounted, GFS will try to use gnbd
>
>- when GFS 6.0.2 sees i/o errors it will panic
>
>If you don't have GFS mounted, the last two steps don't exist and there's
>no panic.
>
>  
>
Thanks, that clarify to us that we don't have any error in our 
configuration :).

Then, the question is:

the new behaviour as you described, will be only present in the CVS 
version (kernel 2.6)
or it will be also back ported to the current GFS 6.0.2 version (kernel 
2.4) ?

Marcelo


From mshk_00 at hotmail.com  Tue Jan 25 10:18:37 2005
From: mshk_00 at hotmail.com (maria perez)
Date: Tue, 25 Jan 2005 11:18:37 +0100
Subject: [Linux-cluster] GFS and HEARTBEAT
Message-ID: <BAY20-F14C4FCCE7F5885EB4FE99A8D860@phx.gbl>

Hi, I have a doubt (among many).

   How can  I  stablish heartbear with GFS???

   I have two nodes connected through ethernet, both nodes are servers 
lock_gulmd (I have installed GFS 6.0.0-7.1 over my kernel 2.4. 21-15.0.4.EL- 
I am using Red Hat Enterprise v.3).
   In the file CLUSTER.CCS I have defined three nodes (the third never take 
part in the cluster),
   the three nodes too are defined in the file NODES.CCS, the method of 
fencing that I have defined is
   MANUAL. I would like to know how I can install heartbeat in my system:

   Has GFS any mechanism that permit run heartbeat??

   Why GFS can add parameters for heartbeat in the file CLUSTER.CCS??

  What relation have this parameter with the process heartbeat??
  I have to download and install heartbeat from ..
                        http://linux-ha.org/download ??????????'
  or GFS incorporates any ???

  Thanks for all,
       maria.

_________________________________________________________________
Descarga gratis la Barra de Herramientas de MSN 
http://www.msn.es/usuario/busqueda/barra?XAPID=2031&DI=1055&SU=http%3A//www.hotmail.com&HL=LINKTAG1OPENINGTEXT_MSNBH


From nigel.jewell at pixexcel.co.uk  Tue Jan 25 10:43:36 2005
From: nigel.jewell at pixexcel.co.uk (Nigel Jewell)
Date: Tue, 25 Jan 2005 10:43:36 +0000
Subject: [Linux-cluster] GNBD & Network Outage
Message-ID: <41F622D8.1060102@pixexcel.co.uk>

Dear all,

We've been looking at the issues of using GNBD to provide access to a 
block device on a secondary installation and we've hit a brick wall.  I 
was wondering if anyone had seen the same behaviour

On host "A" we do:

    gnbd_export -d /dev/sda2 -e foo -c

On host "B" we do:

    gnbd_import -i A

... and as you would expect /dev/gnbd/foo appears on B and is usable.

We have no other aspects of GFS in use.

Now - in order for this to be useful, we've been testing the effects of 
using GNBD if there is a LAN outage.  If we write a big file to a 
mounted file system on B:/dev/gnbd/foo and pull out the LAN cable 
halfway through the data being synced to A, host B never gives up trying 
to contact A.  In fact, if you plug in the cable 10 minutes later the 
sync recovers.

Now - on the surface - this doesn't seem like a big problem, but it is 
when you try and use the imported device alongside software RAID or when 
you want to do something "normal" like reboot the box.  Rebooting just 
stops when it trys to unmount the file systems.

We want to use B:/dev/gndb/foo alongside a local partition on B and 
create a RAID-1 using mdadm.  In the same scenario (where the LAN cable 
is pulled), the md device on B completely stops all of the IO on the 
machine because (presumably) the md software is trying to write to the 
gnbd device ... which is forever trying to contact host A ... and of 
course never gives up.  It would be nice if it did give up and the md 
software continued the md device in degraded mode.

So the question is this (got there in the end).  Can anyone suggest a 
solution and/or alternative/workaround?  Is it possible to specify a 
time-out for the GNBD import/export for when the LAN does die?

Any ideas?

Regards,

-- 
Nige.

PixExcel Limited
URL: http://www.pixexcel.co.uk
MSN: nigel.jewell at pixexcel.co.uk


From mmonge at gmail.com  Tue Jan 25 10:56:07 2005
From: mmonge at gmail.com (Marcos Monge)
Date: Tue, 25 Jan 2005 11:56:07 +0100
Subject: [Linux-cluster] RH Cluster Suite without raw shared disk
Message-ID: <48e9aa4e05012502564c869da2@mail.gmail.com>

Hi

There is anyway to create a cluster without the two shared partitions
in shared disk?

It's possbile, for example, use a NFS Filer (netapp) as the shared
disk in some way?

In my case, I want to do a cluster of 2 nodes, with a nfs filer
sharing the aplication data, and also, if possible, the cluster status
information, without using a SCSI/HBA shared system.

Thanks in advance
Marcos


From fajar at telkom.co.id  Tue Jan 25 11:35:54 2005
From: fajar at telkom.co.id (Fajar A. Nugraha)
Date: Tue, 25 Jan 2005 18:35:54 +0700
Subject: [Linux-cluster] RH Cluster Suite without raw shared disk
In-Reply-To: <48e9aa4e05012502564c869da2@mail.gmail.com>
References: <48e9aa4e05012502564c869da2@mail.gmail.com>
Message-ID: <41F62F1A.8090606@telkom.co.id>

Marcos Monge wrote:

>Hi
>
>There is anyway to create a cluster without the two shared partitions
>in shared disk?
>
>  
>
If you use http://sources.redhat.com/cluster/, then the answer is yes.
I have tested it. The shared device is located on gnbd device exported 
by another machine.
I'm using it as an alternative for NFS.
I can even run Xen (http://www.cl.cam.ac.uk/Research/SRG/netos/xen/) 
domains on it,
and have live migration feature.

>It's possbile, for example, use a NFS Filer (netapp) as the shared
>disk in some way?
>
>  
>
Sorry, haven't tried it yet.

Regards,

Fajar


From mtilstra at redhat.com  Tue Jan 25 14:36:39 2005
From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra)
Date: Tue, 25 Jan 2005 08:36:39 -0600
Subject: [Linux-cluster] RH Cluster Suite without raw shared disk
In-Reply-To: <48e9aa4e05012502564c869da2@mail.gmail.com>
References: <48e9aa4e05012502564c869da2@mail.gmail.com>
Message-ID: <20050125143639.GA16148@redhat.com>

On Tue, Jan 25, 2005 at 11:56:07AM +0100, Marcos Monge wrote:
> It's possbile, for example, use a NFS Filer (netapp) as the shared
> disk in some way?

If you set it up to export iscsi devices, you can put gfs onto that.

-- 
Michael Conrad Tadpol Tilstra
I know that I know just enough to know how much more there is to know. 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050125/43a7c7a2/attachment.sig>

From bmarzins at redhat.com  Tue Jan 25 20:55:38 2005
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Tue, 25 Jan 2005 14:55:38 -0600
Subject: [Linux-cluster] GNBD & Network Outage
In-Reply-To: <41F622D8.1060102@pixexcel.co.uk>
References: <41F622D8.1060102@pixexcel.co.uk>
Message-ID: <20050125205538.GD13289@phlogiston.msp.redhat.com>

On Tue, Jan 25, 2005 at 10:43:36AM +0000, Nigel Jewell wrote:
> Dear all,
> 
> We've been looking at the issues of using GNBD to provide access to a 
> block device on a secondary installation and we've hit a brick wall.  I 
> was wondering if anyone had seen the same behaviour
> 
> On host "A" we do:
> 
>    gnbd_export -d /dev/sda2 -e foo -c
> 
> On host "B" we do:
> 
>    gnbd_import -i A
> 
> ... and as you would expect /dev/gnbd/foo appears on B and is usable.
> 
> We have no other aspects of GFS in use.
> 
> Now - in order for this to be useful, we've been testing the effects of 
> using GNBD if there is a LAN outage.  If we write a big file to a 
> mounted file system on B:/dev/gnbd/foo and pull out the LAN cable 
> halfway through the data being synced to A, host B never gives up trying 
> to contact A.  In fact, if you plug in the cable 10 minutes later the 
> sync recovers.
> 
> Now - on the surface - this doesn't seem like a big problem, but it is 
> when you try and use the imported device alongside software RAID or when 
> you want to do something "normal" like reboot the box.  Rebooting just 
> stops when it trys to unmount the file systems.
> 
> We want to use B:/dev/gndb/foo alongside a local partition on B and 
> create a RAID-1 using mdadm.  In the same scenario (where the LAN cable 
> is pulled), the md device on B completely stops all of the IO on the 
> machine because (presumably) the md software is trying to write to the 
> gnbd device ... which is forever trying to contact host A ... and of 
> course never gives up.  It would be nice if it did give up and the md 
> software continued the md device in degraded mode.
> 
> So the question is this (got there in the end).  Can anyone suggest a 
> solution and/or alternative/workaround?  Is it possible to specify a 
> time-out for the GNBD import/export for when the LAN does die?

Sure. You see the -c in you export line.  Don't put it there.  That puts
the device in (the very poorly named) uncached mode.  This does two things.
One: It causes the server to use direct IO to write to the exported device,
so your read performance will take a hit.  Two: It will time out after
a period (default to 10 sec).  After gnbd times out, it must be able to fence
the server before it will let the requests fail.  This is so that you know
that the server isn't simply stalled and might write out the requests later
(if gnbd failed out, and the requests were rerouted to the backend storage over
another gnbd server, if the first server wrote it's requests out later, it
could cause data corruption).

This means that to run in uncached mode, you need to have a cluster manager and
fencing devices, which I'm not certain that you have.

I've got some questions about your setup.  Will this be part of a clustered
filesystem setup? If it will, I see some problems with your mirror.  When
other nodes (including the gnbd server node A) write to the exported device,
these writes will not appear on the local partion of B.  So won't your mirror
get out of sync?  If only B will write to the exported device, (and that's
the only way I see this working) you can probably get by with nbd, which
simply fails out if it loses connection.

There is a cluster mirror project in the works.  When that is done, you would
be able to have node B gnbd export it's local partition, and then run a mirror
on top of the device exported from A and the device exported from B, which
all nodes could access and would stay in sync. But this project isn't finished
yet.
 
-Ben

> Any ideas?
> 
> Regards,
> 
> -- 
> Nige.
> 
> PixExcel Limited
> URL: http://www.pixexcel.co.uk
> MSN: nigel.jewell at pixexcel.co.uk
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster


From rryll at yahoo.com  Tue Jan 25 23:33:52 2005
From: rryll at yahoo.com (Darryll Napolis)
Date: Tue, 25 Jan 2005 15:33:52 -0800 (PST)
Subject: [Linux-cluster] ga1 clusvcmgrd[27878]: <emerg> readServiceBlock:
	Service number mismatch 4, 6.
Message-ID: <20050125233353.83434.qmail@web51906.mail.yahoo.com>

Using RHEL3 cluster suite, i've been getting the
messages listed below in my /var/log/messages:

ga1 clusvcmgrd[27878]: <emerg> readServiceBlock:
Service number mismatch 4, 6.

anybody knows what does that mean? Any inputs are
greatly appreciated.  Thanks


__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail


From jaime at iaa.es  Wed Jan 26 16:27:01 2005
From: jaime at iaa.es (Jaime Perea)
Date: Wed, 26 Jan 2005 17:27:01 +0100
Subject: [Linux-cluster] kernel versions and gfs
In-Reply-To: <20050125233353.83434.qmail@web51906.mail.yahoo.com>
References: <20050125233353.83434.qmail@web51906.mail.yahoo.com>
Message-ID: <200501261727.01544.jaime@iaa.es>

Hi everybody

I have a strange problem. I need to work with some software that refuses
to work under a version of the  kernel > 2.6.8. On the other hand I would 
like to install gfs and all the other related stuff. I used the last 
version from the cvs and it compiles well under kernel 2.6.10 but I cannot
compile it under the 2.6.8 kernel. 

Something like

/home/jaime/clu/cluster-2.6.8.1/gfs-kernel/src/gfs/ops_file.c:1670: error: 
unknown field `flock' specified in initializer
/home/jaime/clu/cluster-2.6.8.1/gfs-kernel/src/gfs/ops_file.c:1670: 
warning: initialization from incompatible pointer type
make[5]: *** 
[/home/jaime/clu/cluster-2.6.8.1/gfs-kernel/src/gfs/ops_file.o] Error 1
make[4]: *** [_module_/home/jaime/clu/cluster-2.6.8.1/gfs-kernel/src/gfs] 
Error 2

Do I need a specific version of gfs for the 2.6.8 version of the kernel?

Thanks 
-- 

           Jaime D. Perea Duarte. <jaime at iaa dot es>
             Linux registered user #10472

           Dep. Astrofisica Extragalactica.
           Instituto de Astrofisica de Andalucia (CSIC)
           Apdo. 3004, 18080 Granada, Spain. 


From lhh at redhat.com  Wed Jan 26 19:49:59 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 26 Jan 2005 14:49:59 -0500
Subject: [Linux-cluster] ga1 clusvcmgrd[27878]: <emerg>
	readServiceBlock: Service number mismatch 4, 6.
In-Reply-To: <20050125233353.83434.qmail@web51906.mail.yahoo.com>
References: <20050125233353.83434.qmail@web51906.mail.yahoo.com>
Message-ID: <1106768999.16910.112.camel@ayanami.boston.redhat.com>

On Tue, 2005-01-25 at 15:33 -0800, Darryll Napolis wrote:
> Using RHEL3 cluster suite, i've been getting the
> messages listed below in my /var/log/messages:
> 
> ga1 clusvcmgrd[27878]: <emerg> readServiceBlock:
> Service number mismatch 4, 6.
> 
> anybody knows what does that mean? Any inputs are
> greatly appreciated.  Thanks

It's an anomaly, but is mostly noise.

See:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=120934

-- Lon


From lhh at redhat.com  Thu Jan 27 18:36:38 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 27 Jan 2005 13:36:38 -0500
Subject: [Linux-cluster] GFS and HEARTBEAT
In-Reply-To: <BAY20-F14C4FCCE7F5885EB4FE99A8D860@phx.gbl>
References: <BAY20-F14C4FCCE7F5885EB4FE99A8D860@phx.gbl>
Message-ID: <1106850998.16910.168.camel@ayanami.boston.redhat.com>

On Tue, 2005-01-25 at 11:18 +0100, maria perez wrote:

>   What relation have this parameter with the process heartbeat??
>   I have to download and install heartbeat from ..
>                         http://linux-ha.org/download ??????????'
>   or GFS incorporates any ???

Hi Maria,

No work has been done to integrate GFS with Heartbeat.  There is a lot
of common work being done between the linux-cluster project and the 
linux-ha project, so this may change in the future.

However, it should be possible to run GFS as the backend store for
services/resource groups managed by heartbeat, but be aware that
heartbeat's notion of membership and GFS's may not always coincide (e.g.
one might think node X is offline, while the other believes it to be
online).

-- Lon


From lhh at redhat.com  Thu Jan 27 18:49:12 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 27 Jan 2005 13:49:12 -0500
Subject: [Linux-cluster] GFS and HEARTBEAT
In-Reply-To: <1106850998.16910.168.camel@ayanami.boston.redhat.com>
References: <BAY20-F14C4FCCE7F5885EB4FE99A8D860@phx.gbl>
	<1106850998.16910.168.camel@ayanami.boston.redhat.com>
Message-ID: <1106851752.16910.181.camel@ayanami.boston.redhat.com>

On Thu, 2005-01-27 at 13:36 -0500, Lon Hohberger wrote:

> However, it should be possible to run GFS as the backend store for
> services/resource groups managed by heartbeat, but be aware that
> heartbeat's notion of membership and GFS's may not always coincide (e.g.
> one might think node X is offline, while the other believes it to be
> online).

Sorry, forgot to explain the practical implication.  As an example:

1 - Heartbeat detects that node A is offline.
2 - Heartbeat STONITHs node A and takes over services.
3 - (30 seconds pass)
4 - GFS detects that node A is offline.
5 - GFS fences node A and recovers node A's journal.
6 - Cluster in sane state.

In the above case, it's mostly just annoying for the system
administrator and causes node A to be unavailable for longer than
necessary.

(If GFS locks are held when node A dies, then it becomes a bit more
complicated.)

-- Lon


From daniel at osdl.org  Thu Jan 27 19:21:15 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Thu, 27 Jan 2005 11:21:15 -0800
Subject: [Linux-cluster] [PATCH] to fix ccs_tool ld error on latest cvs
In-Reply-To: <200501241018.49756.jaime@iaa.es>
References: <1106351208.14739.8.camel@ibm-c.pdx.osdl.net>
	<200501241018.49756.jaime@iaa.es>
Message-ID: <1106853675.9346.42.camel@ibm-c.pdx.osdl.net>

On Mon, 2005-01-24 at 01:18, Jaime Perea wrote:
> Hi everybody,
> 
> My first posting!
> 
> Perhaps doing 
> LDFLAGS="-lpthread"  make 
> 
> could work. 

Thanks.  Adding -lpthread in the Makefile fixed the build problem.

Here's the patch that fixed the problem.

--- cluster.orig/ccs/ccs_tool/Makefile	2005-01-27 11:15:49.385135157 -0800
+++ cluster/ccs/ccs_tool/Makefile	2005-01-24 15:29:38.000000000 -0800
@@ -25,7 +25,7 @@
 endif
 
 LDFLAGS+= -L${ccs_libdir} `xml2-config --libs` -L${libdir}
-LOADLIBES+= -lccs -lmagma -lmagmamsg -ldl
+LOADLIBES+= -lccs -lmagma -lmagmamsg -ldl -lpthread
 
 all: ccs_tool
 

From marco.yanez at hp.com  Thu Jan 27 23:31:56 2005
From: marco.yanez at hp.com (Yanez, Marco Antonio)
Date: Thu, 27 Jan 2005 17:31:56 -0600
Subject: [Linux-cluster] Question about GFS with 2 nodes
Message-ID: <FFA1D56B2CFF184191BC2B8E9A9986EB024CE6AD@sfeexc01.americas.cpqcorp.net>

Hi,

I have a 2 nodes (one Master and other Secondary) in a GFS
configuration.

We investigate that only 3 nodes GFS can give a continuos operation in
automatic form. But my question is:

In a 2 nodes GFS configuration: if my Master node doesn't work for any
situation, How I can configure the secondary node (manually) to continue
with my normal operation while fix the Master node?

Is is possible?

I appreciate all yoour help on this.

Best Regards.

Marco


From daniel at osdl.org  Fri Jan 28 01:41:10 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Thu, 27 Jan 2005 17:41:10 -0800
Subject: [Linux-cluster] umount hang on 2.6.10 and latest GFS
Message-ID: <1106876470.20799.13.camel@ibm-c.pdx.osdl.net>

I hit a umount hang running my tests.  It was running with
2 nodes mounted cl030 and cl031.  It has finished a test
and is unmounting cl030 when it hung.  cl031 seems fine
with the gfs file system still mounted.

The gfs file system is unmounted (not in /proc/mounts), but
the umount is hung trying to stop dlm_astd

Here's the stack trace:

umount        D 00000008     0 10453  10447                     (NOTLB)
cdaa8de4 00000082 cdaa8dd4 00000008 00000001 000c0000 00000008 00000002
       c1bce798 00000286 e8f782e0 cdaa8dc4 c0116871 e9db55e0 960546f9 c170ef60
       00000000 0001fba3 0167aae6 00005e6b f74f8080 f74f81ec c170ef60 00000000
Call Trace:
 [<c03ce814>] wait_for_completion+0xa4/0xe0
 [<c01326a5>] kthread_stop+0x85/0xae
 [<f8aca033>] astd_stop+0x13/0x32 [dlm]
 [<f8ad1e51>] dlm_release+0x91/0xa0 [dlm]
 [<f8ad2832>] release_lockspace+0x222/0x2f0 [dlm]
 [<f8c2f22c>] release_gdlm+0x1c/0x30 [lock_dlm]
 [<f8c2f55f>] lm_dlm_unmount+0x4f/0x70 [lock_dlm]
 [<f881242c>] lm_unmount+0x3c/0xa0 [lock_harness]
 [<f8fb60ef>] gfs_lm_unmount+0x2f/0x40 [gfs]
 [<f8fc62ab>] gfs_put_super+0x2fb/0x3a0 [gfs]
 [<c0165d67>] generic_shutdown_super+0x127/0x140
 [<f8fc337e>] gfs_kill_sb+0x2e/0x69 [gfs]
 [<c0165b71>] deactivate_super+0x81/0xa0
 [<c017c4dc>] sys_umount+0x3c/0xa0
 [<c017c559>] sys_oldumount+0x19/0x20
 [<c010323d>] sysenter_past_esp+0x52/0x75

dlm_astd      D 00000008     0 10264      6                3235 (L-TLB)
dc9c3ee8 00000046 dc9c3ed8 00000008 00000002 00000800 00000008 c8cc35e0
       f7bc0568 5f8a4c1c 0179a889 e4676c5a 00004b2d dc9c3f14 c051c000 c1716f60
       00000001 000001b0 0167c50b 00005e6b e9db55e0 e9db574c c1714060 00000000
Call Trace:
 [<c03cef7c>] rwsem_down_read_failed+0x9c/0x190
 [<f8aca119>] .text.lock.ast+0xc7/0x1de [dlm]
 [<f8ac9ea5>] dlm_astd+0x1e5/0x210 [dlm]
 [<c013245a>] kthread+0xba/0xc0
 [<c0101315>] kernel_thread_helper+0x5/0x10

So, it looks like dlm_astd is stuck on a down_read().

The only down_read I see is in process_asts().

	down_read(&ls->ls_in_recovery);

So, it looks block on recovery of the lockspace, but the
DLM is not listed in /proc/cluster/services and
/proc/cluster/dlm_locks shows no locks.

Full info available here:
http://developer.osdl.org/daniel/GFS/test.25jan2005/

Ideas?

Daniel


From teigland at redhat.com  Fri Jan 28 02:34:08 2005
From: teigland at redhat.com (David Teigland)
Date: Fri, 28 Jan 2005 10:34:08 +0800
Subject: [Linux-cluster] umount hang on 2.6.10 and latest GFS
In-Reply-To: <1106876470.20799.13.camel@ibm-c.pdx.osdl.net>
References: <1106876470.20799.13.camel@ibm-c.pdx.osdl.net>
Message-ID: <20050128023408.GB5298@redhat.com>


On Thu, Jan 27, 2005 at 05:41:10PM -0800, Daniel McNeil wrote:

> dlm_astd      D 00000008     0 10264      6                3235 (L-TLB)
> dc9c3ee8 00000046 dc9c3ed8 00000008 00000002 00000800 00000008 c8cc35e0
>        f7bc0568 5f8a4c1c 0179a889 e4676c5a 00004b2d dc9c3f14 c051c000 c1716f60
>        00000001 000001b0 0167c50b 00005e6b e9db55e0 e9db574c c1714060 00000000
> Call Trace:
>  [<c03cef7c>] rwsem_down_read_failed+0x9c/0x190
>  [<f8aca119>] .text.lock.ast+0xc7/0x1de [dlm]
>  [<f8ac9ea5>] dlm_astd+0x1e5/0x210 [dlm]
>  [<c013245a>] kthread+0xba/0xc0
>  [<c0101315>] kernel_thread_helper+0x5/0x10
> 
> So, it looks like dlm_astd is stuck on a down_read().
> 
> The only down_read I see is in process_asts().
> 
> 	down_read(&ls->ls_in_recovery);

Yep, that's it.  The ls struct is freed while dlm_astd is blocked there.
I checked in a fix for this a few days ago.

-- 
Dave Teigland  <teigland at redhat.com>


From lhh at redhat.com  Fri Jan 28 16:24:34 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 28 Jan 2005 11:24:34 -0500
Subject: [Linux-cluster] Question about GFS with 2 nodes
In-Reply-To: <FFA1D56B2CFF184191BC2B8E9A9986EB024CE6AD@sfeexc01.americas.cpqcorp.net>
References: <FFA1D56B2CFF184191BC2B8E9A9986EB024CE6AD@sfeexc01.americas.cpqcorp.net>
Message-ID: <1106929474.16910.241.camel@ayanami.boston.redhat.com>

On Thu, 2005-01-27 at 17:31 -0600, Yanez, Marco Antonio wrote:

> We investigate that only 3 nodes GFS can give a continuos operation in
> automatic form. But my question is:
> 
> In a 2 nodes GFS configuration: if my Master node doesn't work for any
> situation, How I can configure the secondary node (manually) to continue
> with my normal operation while fix the Master node?

> Is is possible?

Not with gulm.  Try running CMAN in 2-node mode instead.

-- Lon


From daniel at osdl.org  Sat Jan 29 00:51:46 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Fri, 28 Jan 2005 16:51:46 -0800
Subject: [Linux-cluster] build errors on the latest cvs
Message-ID: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>

I trying to re-build after update from cvs .

Using 'make' gets:

ln -snf libmagmamsg.so.DEVEL.1106957305 libmagmamsg.so.DEVEL
ln -snf libmagma.so.DEVEL.1106957305 libmagma.so
ln -snf libmagma_nt.so.DEVEL.1106957305 libmagma_nt.so
ln -snf libmagmamsg.so.DEVEL.1106957305 libmagmamsg.so
install -d /Views/redhat-cluster/cluster/build/lib
install -d /usr/lib
install: cannot change permissions of `/usr/lib': Operation not permitted
make[2]: *** [install] Error 1
make[2]: Leaving directory `/Views/redhat-cluster/cluster/magma/lib'
make[1]: *** [install] Error 2
make[1]: Leaving directory `/Views/redhat-cluster/cluster/magma'

so it looks like slibdir is not being set right.

I tried building from a clean view and got the same error. :(

Running 'make install' as root does work with (with my patch
to add -pthread to the ccs_tool Makefile), but I like
building it all first before installing it.

Daniel


From rstevens at vitalstream.com  Sat Jan 29 01:08:51 2005
From: rstevens at vitalstream.com (Rick Stevens)
Date: Fri, 28 Jan 2005 17:08:51 -0800
Subject: [Linux-cluster] build errors on the latest cvs
In-Reply-To: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
References: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
Message-ID: <41FAE223.3020108@vitalstream.com>

Daniel McNeil wrote:
> I trying to re-build after update from cvs .
> 
> Using 'make' gets:
> 
> ln -snf libmagmamsg.so.DEVEL.1106957305 libmagmamsg.so.DEVEL
> ln -snf libmagma.so.DEVEL.1106957305 libmagma.so
> ln -snf libmagma_nt.so.DEVEL.1106957305 libmagma_nt.so
> ln -snf libmagmamsg.so.DEVEL.1106957305 libmagmamsg.so
> install -d /Views/redhat-cluster/cluster/build/lib
> install -d /usr/lib
> install: cannot change permissions of `/usr/lib': Operation not permitted
> make[2]: *** [install] Error 1
> make[2]: Leaving directory `/Views/redhat-cluster/cluster/magma/lib'
> make[1]: *** [install] Error 2
> make[1]: Leaving directory `/Views/redhat-cluster/cluster/magma'
> 
> so it looks like slibdir is not being set right.
> 
> I tried building from a clean view and got the same error. :(
> 
> Running 'make install' as root does work with (with my patch
> to add -pthread to the ccs_tool Makefile), but I like
> building it all first before installing it.

Of COURSE you can't change the permissions of /usr/lib as a normal,
mortal user.../usr/lib is owner: root, group: root.  That's why installs
HAVE to be done by root.  Mere mortals aren't allowed to screw with
important things like /usr/lib or /lib.
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-            I'm afraid my karma just ran over your dogma            -
----------------------------------------------------------------------


From daniel at osdl.org  Sat Jan 29 01:35:11 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Fri, 28 Jan 2005 17:35:11 -0800
Subject: [Linux-cluster] build errors on the latest cvs
In-Reply-To: <41FAE223.3020108@vitalstream.com>
References: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
	<41FAE223.3020108@vitalstream.com>
Message-ID: <1106962511.20799.51.camel@ibm-c.pdx.osdl.net>

On Fri, 2005-01-28 at 17:08, Rick Stevens wrote:
> Daniel McNeil wrote:
> > I trying to re-build after update from cvs .
> > 
> > Using 'make' gets:
> > 
> > ln -snf libmagmamsg.so.DEVEL.1106957305 libmagmamsg.so.DEVEL
> > ln -snf libmagma.so.DEVEL.1106957305 libmagma.so
> > ln -snf libmagma_nt.so.DEVEL.1106957305 libmagma_nt.so
> > ln -snf libmagmamsg.so.DEVEL.1106957305 libmagmamsg.so
> > install -d /Views/redhat-cluster/cluster/build/lib
> > install -d /usr/lib
> > install: cannot change permissions of `/usr/lib': Operation not permitted
> > make[2]: *** [install] Error 1
> > make[2]: Leaving directory `/Views/redhat-cluster/cluster/magma/lib'
> > make[1]: *** [install] Error 2
> > make[1]: Leaving directory `/Views/redhat-cluster/cluster/magma'
> > 
> > so it looks like slibdir is not being set right.
> > 
> > I tried building from a clean view and got the same error. :(
> > 
> > Running 'make install' as root does work with (with my patch
> > to add -pthread to the ccs_tool Makefile), but I like
> > building it all first before installing it.
> 
> Of COURSE you can't change the permissions of /usr/lib as a normal,
> mortal user.../usr/lib is owner: root, group: root.  That's why installs
> HAVE to be done by root.  Mere mortals aren't allowed to screw with
> important things like /usr/lib or /lib.

I should have been more clear: when doing a "make"
it should not be touching anything like /usr/lib/ or /lib.
It is ok if 'make install' puts stuff in /usr/lib or /lib
and other places (and that part must be done as root).

It looks like the Makefile uses a prefix to a local directory.
'slibdir' is not using "/Views/redhat-cluster/cluster/build"
prefix that 'libdir' above did (see the install -d line above
the one that failed), so it does not build without being root.
That is the build problem.  Mere mortals should be able to build :)

Daniel


From daniel at osdl.org  Sat Jan 29 01:44:54 2005
From: daniel at osdl.org (Daniel McNeil)
Date: Fri, 28 Jan 2005 17:44:54 -0800
Subject: [Linux-cluster] build errors on the latest cvs AND ccsd
	doesn't run
In-Reply-To: <1106962511.20799.51.camel@ibm-c.pdx.osdl.net>
References: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
	<41FAE223.3020108@vitalstream.com>
	<1106962511.20799.51.camel@ibm-c.pdx.osdl.net>
Message-ID: <1106963093.20799.59.camel@ibm-c.pdx.osdl.net>

I cannot get the latest cvs to run because ccsd complains:

[root at cl030 cluster]# ccsd
Failed to connect to cluster manager.
Hint: Magma plugins are not in the right spot.

Either it did not install stuff in the right spot or
it is looking in the wrong spot.

Any ideas?

Daniel

PS BTW, I did at first try running the updated kernel modules
with the old user-level tools and some other nodes running
code from a few days ago.  Cman gave me a verson mis-match.
So I updated all the nodes (kernel modules and user-level).
If there are version changes like this, it would be nice to
email a note to the gfs mailing list.


From sunjw at onewaveinc.com  Sat Jan 29 16:30:47 2005
From: sunjw at onewaveinc.com (=?gb2312?B?y++/oc6w?=)
Date: Sun, 30 Jan 2005 00:30:47 +0800
Subject: [Linux-cluster] fence problem
Message-ID: <SERVERlICSsiWdBdsnv0000091b@mail.onewaveinc.com>

Hello all,

I have brocade FC switch whose model is silkworm 3850,
which fence method can I use on GFS for kernel 2.6.9, 
and how to configure the file cluster.conf,how to configure the FC switch?

Thanks for any reply!
Best regards!
Luckey


From sunjw at onewaveinc.com  Sat Jan 29 16:14:48 2005
From: sunjw at onewaveinc.com (=?gb2312?B?y++/oc6w?=)
Date: Sun, 30 Jan 2005 00:14:48 +0800
Subject: [Linux-cluster] NFS over GFS problem
Message-ID: <SERVERO4OPXok84FTA70000090e@mail.onewaveinc.com>

Hello,all

I got the gfs' code from cvs on 2004-12-12, which was for kernel 2.6.9 ,
and I opened the NFS server on one GFS node, then I mounted the nfs filesystem
on some machines which did not in the cluster.The NFS server's OS is FC3, 
the NFS clients' OS is RH9. There were problems:

1. When I stoped the NFS server and umount the GFS with NFS client mounted, 
no error happened. but after I remounted the GFS , restarted the NFS server, remounted
the NFS,and umounted the NFS, stoped NFS server, then umount of GFS hang, system reboot
failed too.the dmesg is :
                      
 GFS: fsid=xxx:xxx.3: Unmount seems to be stalled. Dumping lock state...
 Glock (2, 26)
   gl_flags =
   gl_count = 2
   gl_state = 0
   lvb_count = 0
   object = yes
   aspace = 0
   reclaim = no
   Inode:
     num = 26/26
     type = 2
     i_count = 1
     i_flags =
     vnode = yes
 Glock (5, 26)
   gl_flags =
   gl_count = 2
   gl_state = 3
   lvb_count = 0
   object = yes
   aspace = no
   reclaim = no
   Holder
     owner = -1
     gh_state = 3
     gh_flags = 5 7
     error = 0
     gh_iflags = 1 6 7

what's the messages' meaning? It seams as if the mount/umount sequence is critical.

2. When two GFS nodes read the same file(s)(as 1~50 files) on the storage at the
same time, total disk IO performance is worse than one node's? When one GFS node and 
one NFS node(client) read the same file(s) ,the NFS node's performance is nearly zero? 
What's worse, the command "ls" looks like blocked on the GFS directory 
on both GFS nodes and NFS nodes on above cases. So how can I speed up the command "ls",
or may be the system call "stat"?  The GFS filesystem use the "lock_dlm" lock protocol. 
Will "lock_gulm" protocol improve the status, or any other? Are there any 
gfs tune options or mount options to resolve the problem?  

Thanks for any reply!
Best regards! 
Luckey


From lhh at redhat.com  Mon Jan 31 15:00:41 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 31 Jan 2005 10:00:41 -0500
Subject: [Linux-cluster] build errors on the latest cvs
In-Reply-To: <1106962511.20799.51.camel@ibm-c.pdx.osdl.net>
References: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
	<41FAE223.3020108@vitalstream.com>
	<1106962511.20799.51.camel@ibm-c.pdx.osdl.net>
Message-ID: <1107183641.22835.27.camel@ayanami.boston.redhat.com>

On Fri, 2005-01-28 at 17:35 -0800, Daniel McNeil wrote:
> It is ok if 'make install' puts stuff in /usr/lib or /lib
> and other places (and that part must be done as root).
> 
> It looks like the Makefile uses a prefix to a local directory.
> 'slibdir' is not using "/Views/redhat-cluster/cluster/build"
> prefix that 'libdir' above did (see the install -d line above
> the one that failed), so it does not build without being root.
> That is the build problem.  Mere mortals should be able to build :)

You're correct.

Fixing.

-- Lon


From lhh at redhat.com  Mon Jan 31 15:08:18 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 31 Jan 2005 10:08:18 -0500
Subject: [Linux-cluster] build errors on the latest cvs
In-Reply-To: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
References: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
Message-ID: <1107184098.22835.31.camel@ayanami.boston.redhat.com>

On Fri, 2005-01-28 at 16:51 -0800, Daniel McNeil wrote:
> install -d /Views/redhat-cluster/cluster/build/lib
> install -d /usr/lib

That shouldn't happen during 'make'.

Did you pass any flags to configure?  It's working for me.

-- Lon


From lhh at redhat.com  Mon Jan 31 15:13:51 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 31 Jan 2005 10:13:51 -0500
Subject: [Linux-cluster] build errors on the latest cvs
In-Reply-To: <1106962511.20799.51.camel@ibm-c.pdx.osdl.net>
References: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
	<41FAE223.3020108@vitalstream.com>
	<1106962511.20799.51.camel@ibm-c.pdx.osdl.net>
Message-ID: <1107184431.22835.35.camel@ayanami.boston.redhat.com>

On Fri, 2005-01-28 at 17:35 -0800, Daniel McNeil wrote:
> It looks like the Makefile uses a prefix to a local directory.
> 'slibdir' is not using "/Views/redhat-cluster/cluster/build"
> prefix that 'libdir' above did (see the install -d line above
> the one that failed), so it does not build without being root.
> That is the build problem.  Mere mortals should be able to build :)

Oh.

I know what happened...  The top-level configure doesn't know about
slibdir.

The magma and magma-plugins builds are fine, but the top level causes
the others to bomb.

-- Lon


From lhh at redhat.com  Mon Jan 31 15:19:19 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 31 Jan 2005 10:19:19 -0500
Subject: [Linux-cluster] build errors on the latest cvs
In-Reply-To: <1107184431.22835.35.camel@ayanami.boston.redhat.com>
References: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
	<41FAE223.3020108@vitalstream.com>
	<1106962511.20799.51.camel@ibm-c.pdx.osdl.net>
	<1107184431.22835.35.camel@ayanami.boston.redhat.com>
Message-ID: <1107184759.22835.40.camel@ayanami.boston.redhat.com>

On Mon, 2005-01-31 at 10:13 -0500, Lon Hohberger wrote:
> I know what happened...  The top-level configure doesn't know about
> slibdir.
> 
> The magma and magma-plugins builds are fine, but the top level causes
> the others to bomb.

Erm, no, that's not it either.

It works fine for me:

ln -snf libmagma_nt.so.DEVEL.1107184454 libmagma_nt.so.DEVEL
ld -shared -soname libmagmamsg.so.DEVEL -o
libmagmamsg.so.DEVEL.1107184454 message.o fdops.o -lc
ln -snf libmagmamsg.so.DEVEL.1107184454 libmagmamsg.so.DEVEL
ln -snf libmagma.so.DEVEL.1107184454 libmagma.so
ln -snf libmagma_nt.so.DEVEL.1107184454 libmagma_nt.so
ln -snf libmagmamsg.so.DEVEL.1107184454 libmagmamsg.so
install -d /tmp/lon/usr/lib
install -d /tmp/lon/usr/lib

In the configure script, it substitutes them the same way; this is
rather strange...

-- Lon


From lhh at redhat.com  Mon Jan 31 15:26:23 2005
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 31 Jan 2005 10:26:23 -0500
Subject: [Linux-cluster] build errors on the latest cvs
In-Reply-To: <1107184759.22835.40.camel@ayanami.boston.redhat.com>
References: <1106959906.20799.36.camel@ibm-c.pdx.osdl.net>
	<41FAE223.3020108@vitalstream.com>
	<1106962511.20799.51.camel@ibm-c.pdx.osdl.net>
	<1107184431.22835.35.camel@ayanami.boston.redhat.com>
	<1107184759.22835.40.camel@ayanami.boston.redhat.com>
Message-ID: <1107185183.22835.43.camel@ayanami.boston.redhat.com>

On Mon, 2005-01-31 at 10:19 -0500, Lon Hohberger wrote:

> In the configure script, it substitutes them the same way; this is
> rather strange...

Ok, the coffee has hit.

Fix in pool.

-- Lon


From amanthei at redhat.com  Mon Jan 31 16:06:47 2005
From: amanthei at redhat.com (Adam Manthei)
Date: Mon, 31 Jan 2005 10:06:47 -0600
Subject: [Linux-cluster] fence problem
In-Reply-To: <SERVERlICSsiWdBdsnv0000091b@mail.onewaveinc.com>
References: <SERVERlICSsiWdBdsnv0000091b@mail.onewaveinc.com>
Message-ID: <20050131160647.GJ10537@redhat.com>

On Sun, Jan 30, 2005 at 12:30:47AM +0800, ?????? wrote:
> Hello all,
> 
> I have brocade FC switch whose model is silkworm 3850,
> which fence method can I use on GFS for kernel 2.6.9, 
> and how to configure the file cluster.conf,how to configure the FC switch?

fence_brocade will probably work.  The parameters are documented in the man
page.

-- 
Adam Manthei  <amanthei at redhat.com>